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WEIGHTING MERIT RATING ITEMS 


JOSEPH TIFFIN anp WAYNE MUSSER 
Purdue University 


** Adding the score on each trait of a merit rating scale 
directly does not ‘weight’ each trait equally in an em- 
ployee’s total merit rating score.’’ 


OW are you combining the scores on each trait of your 

H merit rating scale to get an employee’s over-all rating? 

Are you giving each trait a relative weighting or are 

you weighting each trait equally? In either case, are you sure 

- that each trait is being weighted as you had planned? If you 

are not considering the variability of the ratings on each trait, 

the trait of least importance actually may be receiving the 
heaviest weight. 

When introducing a merit rating plan, frequently a good 
bit of thought is given to the deciding of what relative weight 
each trait should be given in determining an employee’s total 
score. But frequently, after that decision has been made by 
the use of the best judgment available in the plant, each trait 
actually is allowed to weight itself in some haphazard order— 
sometimes even in the reverse order to that planned. 

When combining scores—regardless of the nature of them— 
they weight themselves automatically in proportion to their 
respective variabilities—standard deviations.’ 

The following examples illustrate the way in which merit 
rating items automatically assume a relative weighting when 


1A standard deviation is a measure of variability of a range of 
scores or ratings. The formula for its computation is: standard devia- 


tion =4 Low where x equals the deviation of each score from the mean 


of the distribution of scores, and N equals the number of scores in the 
entire distribution. 
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the rating on each trait is added directly to get an employee’s 
total rating. 

Mr. A and Mr. B receive the following ratings on ‘‘indus- 
triousness’’ and ‘‘knowledge of the job’’ on a 50-point merit 
rating scale made up of twelve different traits: 

**Industriousness’’ ‘‘ Knowledge of the job’’ Total rating 


Mr. A 40 30 70 
Mr. B 30 40 70 


By adding directly the ratings on these two traits both Mr. 
A and Mr. B receive the same total rating. The question now 
is: ‘‘Is this statement of equal ratings justified?’’ Continuing 
further with our example, we shall see that it is not. 

Now let us consider the mean (average) and the standard 
deviation of the ratings of all employees in this plant for 
‘*‘industriousness’’ and ‘‘knowledge of the job’’: 

**Industriousness’’ ‘* Knowledge of the job’’ 


Mean 33 25 
Standard deviation 3 6 


We see that the mean (average) of all the ratings on ‘‘indus- 
triousness’’ is 8 points higher than that on ‘‘knowledge of the 
job,’’ but at the same time, the variability (standard devia- 
tion) of the ratings on ‘‘knowledge of the job’’ is twice that 
of the ratings on ‘‘industriousness.”’ 

To compare the total ratings of Mr. A and Mr. B in varia- 
bility units we shall convert their ratings into deviations from 
the mean in terms of standard deviation units (z-scores).? 


2 Scores expressed in terms of z-scores show how far above or below 
the mean (average) of the distribution that score is located. The z-score 


formula is: z-score oa where X equals the individual score, M 
equals the mean (average) of the distribution, and 8.D. equals the stand- 
ard deviation of the distribution. 

A z-score may have either a negative or a positive sign. A z-score 
with a negative sign is located below the mean, while one with a positive 
value falls above the mean. All z-scores are comparable since they range 
from — 3.00 to + 3.00, have a mean at zero, and a standard deviation of 1. 


A table of z-scores should be referred to for the interpretation of 
z-score values. 


— ©. SB oO OS lUetlCUr 
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RATINGS IN STANDARD DEVIATION UNITS 


** Industriousness’’ ‘‘ Knowledge of the job’’ Total score 
Mr. A 2.333 -834 3.16 
Mr. B - 1.00 2.50 1.50 


This transfer of ratings into z-score units shows that the 
total scores of Mr. A and Mr. B are not equal. Instead, Mr. A 
now has a rating considerably higher than that of Mr. B. 

The above example assumes that we wanted to weight the 
ratings on each trait equally in determining an employee’s 
total merit rating score. It does not follow that combined 
ratings necessarily should be weighted equally. Instead, per- 
haps an employee’s rating on one trait that is more important 
for success on his present job than others should be given more 
weight in determining his total merit rating. For example, in 
determining a clerk’s total merit rating score, ‘‘accuracy’’ 
probably should be given more weight than ‘‘safety.’’ 


WEIGHTING MERIT RATING ITEMS 


Using the same two merit rating traits, suppose that we 
decide to give ratings on ‘‘industriousness’’ twice the weight 
given to ratings on ‘‘knowledge of the job.’” We now have the 
following ratings in terms of z-scores: 


UNWEIGHTED Z-SCORE RATINGS 


‘* Industriousness’’ ‘‘ Knowledge of the job’’ Total rating 
Mr. A 2.33 83 3.16 
Mr. B - 1.00 2.50 1.50 


WEIGHTED Z-SCORE RATINGS 


Mr. A 4.66 83 5.49 
Mr. B — 2.00 2.50 50 


Here, it will be seen, the ratings of Mr. A and Mr. B differ 
X-M_ 40-33 _ 


3Mr. A’s z-score on ‘‘industriousness’’ equals SD 3 





=+ 2.33. 


X-M 30-25 
iad 





4 Mr. A’s z-score on ‘‘knowldege of the job’’ = =+ 83. 
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still more markedly than when the two traits are weighted 
equally. 

Continuing our example further, suppose we decide to give 
ratings on ‘‘knowledge of the job’’ twice the weight of ratings 


«és 


on ‘‘industriousness.’’ The z-score ratings are found to be: 
UNWEIGHTED Z-SCORE RATINGS 
** Industriousness’’ ‘* Knowledge of the job’’ Total rating 
Mr. A 2.33 83 3.16 
Mr. B — 1.00 2.50 1.50 
WEIGHTED Z-SCORE RATINGS 
Mr. A 2.33 1.66 4.00 
Mr. B - 1.00 5.00 4.00 


In this case the total ratings in terms of variabilities (stand- 
ard deviations) are equal for Mr. A and Mr. B as they were 
in our original example when their ratings on ‘‘industrious- 
ness’’ and ‘‘knowledge of the job’’ were added directly. 

Going back to our original example where both employees 
were given an equal total rating, we find the reason for the 
equal ratings in the above example—the variability of all the 
ratings on ‘‘knowledge of the job’’ is twice that for all ratings 
on ‘‘industriousness.’’ The ratio of the variabilities (stand- 
ard deviations) of the ratings on these two traits in our first 
example is identical to the final weighting that we gave ‘‘indus- 
triousness’’ and ‘‘knowledge of the job’’ in our last example. 

In other words, the variability of ratings on ‘‘knowledge of 
the job’’ is twice the variability on ‘‘industriousness’’; and 
this was the relative weight that we gave the ratings on these 
two traits in the final example. In our original example this 
is the relative weight that these two traits automatically as- 
sumed when the raw scores were added directly according to 
the relative size of its variability. 


APPLICATION TO A MERIT RATING SCALE 


Twelve traits are included on the merit rating scale that is 
used by a large steel corporation in the Mid West. The man- 








WEIGHTING MERIT RATING ITEMS 579 


agement of this corporation had planned to give each trait 
equal weight in determining an employee’s over-all score by 
adding directly the rating on each of the twelve traits. For 
this study merit rating scores from about 1,800 of these em- 
ployees were obtained for each of the twelve traits. In most 
cases the ratings of each employee represents an average of the 
ratings given to him by 2 or 3 supervisors. 

For each trait to be given equal weight as intended the vari- 
ability of the ratings of all employees on each of the twelve 
traits must be equal. The variabilities computed for the 
ratings of these 1,800 employees (all occupations grouped to- 
gether) on each trait are shown in Table 1. Also shown in 
this table is the ratio of the variability on each trait to that 
trait having the lowest variability—‘‘safety.’’ 


TABLE 1 


The Mean, Standard Deviation, and Rank of Standard Deviations of 
Merit Ratings Given to 1,800 Employees in a Steel Mill. 
(All Occupations Combined) 





‘ Standard Relative size 
Trait Mean deviation of 8.D. 





Safety 32.1 2.24 1.00 
Knowledge of the job 31.1 2.77 1.24 
Versatility 31.5 2.88 1.29 
Accuracy 31.9 2.69 1.20 
Productivity 31.8 2.59 1.15 
Over-all job performance 31.7 2.63 1,18 
Industriousness 32.1 2.96 1.32 
Initiative 31.3 3.08 1.38 
Judgment 31.4 2.68 1.20 
Cooperation 32.8 2.76 1.22 
Personality 32.0 2.51 1.12 
Health 34.2 3.14 1.40 
































It will be seen that an employee’s rating on ‘‘health”’ is 
given 1.4 times as much weight in determining his total score 
as ‘‘safety.”’ The other traits have automatically weighted 
themselves in the haphazard order shown. 











— 
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To see if the traits were weighting themselves in the same 
order for different occupational groups, an analysis was made 
of the variabilities of ratings for five different occupational 
groups—foremen, electricians, machinists, clerks, and laborers. 
Table 2 shows the variabilities on each of the twelve traits for 
these occupational groups. Asin Table 1, the ratio of the varia- 
bilities of each trait to the trait having the lowest variability 
is also shown. 

At a glance one can see that, with one exception, for none 
of these groups is the ratio of relative weightings of the twelve 
traits the same. ‘‘Safety’’ is that one exception. For all 
groups except the electricians, the trait ‘‘safety’’ is given the 
lowest weight in determining an employee’s total merit rating 


TABLE 2 


The Mean, Standard Deviation, and Rank of Standard Deviations of 
Merit Ratings Given to Five Different Occupational 
Groups in a Steel Mill 











Foremen Electricians Machinists 

Rela- Rela- Rela- 

Trait tive tive tive 

Mean 8.D. size Mean 8.D. size Mean S.D. size 

of of of 

8.D. 8.D. 8.D. 

ge ee Sane 33.4 2.35 1.00 32.2 2.09 1.05 32.7 2.01 1.00 
Knowledge of the 

, Sa 33.0 3.17 1.35 31.9 2.28 114 32.5 2.29 1.14 

Versatility .............. 32.8 3.15 1.34 31.6 2.42 1.21 31.9 2.68 1.33 

AOCCUTACY ooececcccneenn 33.6 3.04 1.29 31.8 2.15 1.08 32.1, 2.32 1.15 

Productivity ........ 33.3 3.03 1.29 31.7 2.04 1.02 31.9 2.30 1.14 

Over-all job 


performance .. 33.4 3.04 1.29 31.8 2.00 1.00 31.7 2.44 1.21 
Industriousness .. 33.3 3.32 1.41 32.1 2.43 1.22 32.4 2.60 1.29 


Titi ative  cccccccn 84.0 3.32 141 31.3 2.28 1.14 31.5 2.54 1.26 
Judgment _.............. 32.8 2.99 127 31.5 2.15 1.08 31.7 2.53 1.26 
Cooperation ........ 34.7 2.87 1.22 32.8 2.19 1.10 32.8 2.47 1.23 
Personality ....... 32.9 2.75 1.17 32.3 2.15 1.08 32.3 242 1.20 
} 34.8 2.93 1.25 348 3.08 154 33.8 2.47 1.23 


Largest ratio... 1.41 1,54 1.33 
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TABLE 2 (Continued) 





Clerks Laborers 





Relative Relative 
8.D. size of Mean 8.D. size of 
8.D. 8.D. 





Safety . 0 1.99 1.00 31.0 1.61 1.00 
Knowledge of job 7 2.73 = 1.37 30.2 2.26 1.40 
Versatility 8 2.59 1.30 29.8 2.51 1.56 
Accuracy 3 2.54 1.28 30.1 2.10 1.30 
Productivity 7 255 1.28 30.3 2.25 1.40 
Over-all job performance 0 249 1.25 30.1 2.09 1.30 
Industriousness 6 281 £1.41 30.3 2.50 1.55 
Initiative 5 2.76 1.39 29.1 2.57 1.60 
Judgment 8 261 1.31 29.6 2.10 1.30 
Cooperation 8 261 1.31 31.1 2.43 1.51 
Personality 2 2.46 1.24 30.8 2.31 1.43 
Health 1 2.99 1.50 33.4 3.54 2.20 


Largest ratio 1.50 2.20 


























score merely because the range of ratings on ‘‘safety’’ is the 
smallest. 

Logically, it would seem that the trait ‘‘safety’’ is of greater 
importance to a laborer or a machinist in a steel mill than some 
of the other traits included. Yet ‘‘safety’’ automatically is 
given the least relative weight in determining an employee’s 
total merit rating score for this group of employees. 

Another illogical finding is the weight given to ‘‘health”’’ 
scores. With the exception of the foreman and machinists, an 
employee’s rating (given to him by his supervisor) on 
“‘health’’ is given the heaviest weight in determining his total 
merit rating score. Of all traits for a foreman to rate, it 
would seem that ‘‘health’’ should be given the least weight. 
“*Health’’ can most accurately be rated by the doctors of the 
Medical Department—not by an employee’s foreman. 

From the above table, one is impressed with the haphazard 
order in which the different traits are weighting themselves. 
The order is neither consistent nor logical. Instead, it is hap- 





582 JOSEPH TIFFIN AND WAYNE MUSSER 


hazardly based on the variabilities of the ratings on the dif- 
ferent traits. Each trait certainly is not being weighted 
equally as the management had planned. 


HOW TO COMBINE MERIT RATING SCORES 


How can one combine ratings on the different traits of a 
merit rating scale so that each trait will be given its predeter- 
mined weight? We found above that each trait ordinarily 
will not be given equal weight by adding the ratings directly. 
Nor can we be sure that each trait will be weighted as expected 
by weighting an employee’s ratings on the different traits 
before combining them. Instead, we have found that the 
weighting of a trait depends upon the variability of the ratings 
of all employees on that trait as compared to the variabilities 
of the other traits. What method can be used? 

The use of z-scores—comparable scores—is the answer. 
Each year when your employees are given a merit rating, with 
a few extra hours of computational work, the ratings of each 
employee can be transformed to z-scores—and then you will 
be sure of how each trait is being weighted. 

The formula for computing a z-score was given in a footnote 


at the beginning of this article. It is: z-score = > where X 


equals an employee’s merit rating score on any trait, M equals 
the mean (average) score of the entire distribution of ratings 
on that trait for all employees, or all employees of any par- 
ticular occupational group, and 8.D. equals the standard devia- 
tion of the same distribution of ratings. By their very nature 
z-scores range between + 3.00 and — 3.00. 

Suppose your merit rating scale is made up of twelve dif- 
ferent traits as the one discussed in this article. Then to get 
Mr. A’s total merit rating score in z-score form, his rating on 
each trait must be transformed to its respective z-score value. 
If you wish to give each trait equal weight in determining 
Mr. A’s total score, add his z-scores on each trait directly to 
get his total merit rating score. 
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But suppose for the foreman group you decide that ‘‘judg- 
ment’’ should be given twice as much weight as ‘‘safety.’’ 
Now before adding the z-scores for Mr. A foreman his z-score 
on ‘‘judgment’’ would be multiplied by 2, and this value added 
directly to his z-score on ‘‘safety.’’ 

If a merit rating system is to serve its purpose of evaluating 
the performance of employees more objectively, the manage- 
ment interpreting the scores must know what relative weight 
an employee’s rating on each trait has been given. This weight 
eannot be known unless the variabilities of ratings of each 
distribution are computed. 

Transforming the ratings to z-scores which are then com- 
bined, either by simple addition if all traits are to be weighted 
equally or after multiplying by appropriate weights (if this is 
considered desirable), will insure the elimination of chance 
and unknown weights attaching themselves to the merit rating 
items. 





PSYCHOLOGICAL TESTS IN THE SELECTION 
OF ENROLLEES IN ENGINEERING, 
SCIENCE, MANAGEMENT, 
DEFENSE TRAINING 
COURSES 


WILLIAM McGEHEE anp D. J. MOFFIE 
North Carolina State College 


OORE (1) has analyzed the characteristics of over 
M 10,000 enrollees in Engineering, Science, Manage- 
ment, Defense Training Courses in Pennsylvania. 
Ruggles (2) has described a smaller group enrolled in 
ESMDT Courses in North Carolina. Both reports indicate 
that the enrollees are individuals capable of profiting from the 
instruction offered by the ESMDT Program. 

It cannot be assumed, however, that these enrollees are 
capable of profiting equally from instruction in all the various 
courses offered by the ESMDT Program. The present report 
is an attempt to analyze the performance of students, described 
by Ruggles (2), in certain ESMDT Courses in terms of scores 
made on psychological tests and grades received in the respec- 
tive courses. 

The anthors are aware that the data presented suffer from 
the small number of subjects available, especially in certain 
courses. It is believed, however, that the results obtained are 
suggestive and can serve as a point of departure for other 
investigators and for administrators interested in placing 
ESMDT enrollees in courses from which they can derive the 
greatest benefit. 

Certain psychological tests were administered in February, 
1942, to enrollees' in seven different ESMDT Courses given by 
the School of Engineering of N. C. State College, Raleigh, 


1 Ruggles (2) has given an adequate description of these enrollees. It 
is, therefore, unnecessary to present these data in this report. 
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PSYCHOLOGICAL TESTS 
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N. C. The Otis Self-Administering Test of Mental Ability 
(Higher Form A) was administered to enrollees in all the 
courses. The enrollees in the various classes were also admin- 
istered two or three additional tests from a group made up 
of the Bennett Mechanical Comprehension Test, Form AA, 
The Revised Minnesota Paper Form Board, Series AA, The 
Minnesota Vocational Test for Clerical Workers, The Iowa 
Mathematics Aptitude Test, Series CA1, Revised A, and The 
Iowa Chemical Aptitude Test, Series CAl, Revised A. The 
names of the various courses and the tests administered to the 
enrollees in each course are shown in Table 1. 

Zero-order and multiple coefficients of correlation were com- 
puted between the test scores and grades made in the various 
classes by the enrollees at the end of the three month period 
of instruction. These data are also shown in Table 1. 

It is evident from the data in Table 1 that satisfactory bat- 
teries are found for predicting achievement in Aircraft Inspec- 
tion, Fabric Inspection, and Material Testing courses. A less 
adequate battery exists for Engineering Drawing, while the 
results are unsatisfactory for courses in Architectural Engi- 
neering, Chemical Testing, and Instrument Men and Topog- 
raphers. It is also evident that in Fabric Inspection the Iowa 
Chemical Aptitude test makes the major contribution. The 
value of the Minnesota Paper Form Board in no instance is 
very high, although engineering drawing is a major part of 
each of the courses in which the test was used. 

The data presented, while limited by the small number of 
enrollees in each course, suggest the possibility of effectively 
using psychological tests in the selection and placement of 
enrollees in Engineering, Science, Management, Defense 
Training Courses. 

BIBLIOGRAPHY 
1. Moors, Bruce V. Analysis of results of tests administered to men 
in engineering defense training courses. J. Appl. Psychol., 
1941, 25, 619-635. 


2. Rueetes, E. W. The educational value of pre-service defense courses. 
J. Eng. Educ., 1942, 32, 841-854. 
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A STUDY OF THE SOPHOMORE TESTING 
PROGRAM AT THE UNIVERSITY OF 
MINNESOTA—PART III 


ROBERT B. SELOVER 
The Prudential Insurance Company of America 


PARTIAL evaluation of the results of the Sophomore 

A Testing Program at the University of Minnesota has 

been presented in two preceding articles in this jour- 

nal. These reports dealt with the average profiles of success- 

ful major groups, the correlation between test scores and sub- 

sequent scholastic success, and the use of these profiles in the 

differentiation of successful students in different major groups. 

Here we shall be concerned with the value of these tests for 
selecting students to be granted degrees with honors. 

At the University of Minnesota degrees with honors are 
granted in the following manner. Students who maintain an 
honor point ratio of 2.00 or better are granted a degree cum 
laude without further examination. Such students, however, 
may apply to the Honors Committee for higher honors. In 
order to qualify for a degree magna cum laude a student must 
pass an oral examination given by the Honors Committee. In 
order to qualify for a degree summa cum laude a student must 
in addition submit an acceptable original piece of work. 

Before it was possible to determine how well these tests 
would differentiate students granted honors from those re- 
jected or failed by the Honors Committee, it was first neces- 
sary to test differences between major groups for students 
who received honors. That is, it was possible that real dif- 
ferences between honor students and those who were refused 
the distinction might be concealed by differences existing be- 
tween major groups. In order to observe the value of these 
tests in selecting honors students it was, therefore, necessary 
to confine comparisons to those groups which were homo- 
geneous in test performance. 
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Therefore, the significance of differences among major 
groups was tested by means of analysis of variance for stu- 
dents receiving degrees with higher honors.’ It was found 
that while the performance of honors students in all major 
groups could be considered homogeneous with respect to per- 
formance on the parts of the English test and on the fine arts 
test, signficant differences were present on the history and 
social science test, the foreign literature test, and the general 
science test. The variances were found to be homogeneous for 
all measures. For those measures which indicated that the 
performance of honors students in all major groups could not 
be combined, indivdual tests were calculated between major 
groups in order to discover which groups could be combined. 
These calculations indicated that majors in English, journal- 
ism, Romance language, social science, social work and liberal 


TABLE 6 


Differences between Honor Students and those Failed or Rejected by the 
Honors Committee for Majors in English, Journalism, Romance 
Language, Social Sciences, Social Work, and Liberal Arts 


Group I 





N -53 N-42 Si 
Mean Mean — 
for for aire in. 
honors failed aa 
group group 


History & Social Science 146.87 128.57 18.30 6.71 2.73 
Foreign Literature ........... 99.34 75.19 24.15 7.51 3.22 
I GIB» seeccrentciersitcioniinn 80.02 67.95 12.07 6.87 1.76 
Total General Culture 326.23 271.71 54.52 17.97 3.03 
English Usage 70.60 65.17 5.43 1.20 4.53 
Spelling 68.92 63.69 5.23 1.62 3.23 
Vocabulary ‘ 72.14 4.80 1.24 3.87 
Total English . 201.00 15.47 3.05 5.07 
General Science é 63.05 14.80 6.62 2.24 
Total 4 yrs. H.P.R. . 2.22 0.31 0.05 6.20 











7 For the purpose of this paper, the term ‘‘higher honors’’ is used to 
designate degrees granted magna cum laude and summa cum laude. 
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arts were homogeneous in their mean performance on all tests. 
It was also discovered that the remaining major groups showed 
non-significant differences among themselves. These included 
majors in natural science, medicine, mathematics, and psy- 
chology. The calculations were, of course made only for the 
three tests which previously indicated the presence of differ- 
ences. Students who received higher honors were then com- 
bined into two groups each of which was homogeneous with 
respect to test performance. 

Students in each of these two groups who applied for higher 
honors but who were rejected or failed by the Honors Com- 
mittee were then selected for comparison with those who re- 
ceived higher honors. The students who were rejected or 
failed by the Honors Committee were among those granted 
degrees cum laude. 

The differences in mean performance between students who 
were granted honors and those rejected or failed are shown in 


TABLE 7 


Differences between Honors Students and those Failed or Rejected by the 
Honors Committee for Majors in National Science, Medicine, 
Mathematics, and Psychology 


Group II 


b 





N-53 N = 42 Si 
ve OE ae Oe 
honors failed oane differ- t 
ence 


group = group 
History & Social Science 120.52 105.82 14.70 11.69 1.26 








Foreign Literature ........... 65.43 59.36 6.07 9.04 0.67 
ty | See Rael 63.17 67.91 — 4.75 8.93 0.53 
Total General Culture .. 249.13 233.09 16.04 23.70 0.68 
English Usage ................ 68.39 68.45 — 0.06 2.27 0.03 
Spelling ................. sa 62.74 64.36 —-— 1.62 3.00 0.54 
Vocabulary ............. na 74.78 73.00 1.78 2.12 0.84 
Total English 205.91 205.82 0.09 5.73 0.02 
General Science ................. 127.61 115.09 12.52 11.17 1.12 


Total 4 yrs. H.P.R. ......... 2.48 2.23 0.25 0.07 3.57 
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Tables 6 and 7. For Group I, comprising students majoring 
in English, journalism, and the social sciences, marked dif- 
ferences between the two groups of students were found. A 
significant difference is observed for all tests except the Fine 
Arts Test. For Group II, students in mathematics, natural 
science, medicine and psychology, however,only one signifi- 
cant difference between those passed and those failed or re- 
jected was found. This is a difference in honor point ratio 
for all four years work. We may, therefore, conclude that for 
majors in medicine, natural science, mathematics and psychol- 
ogy, these tests are not helpful in the selection of students 
granted higher honors by the Honors Committee. 

For Group I in which marked differences between those 
passed and failed is observed, we may ask the further ques- 
tion : what is the best combination of these tests to discriminate 
between the two groups? In order to answer this question the 
seven measures were used to maximize the difference between 
the two groups by means of the discriminant function. 


TABLE 8 





Test ar t 





History and Social Science 0.8958977 0.55039 
Foreign Literature — 0.2045578 0.1253 
English Usage 4.5938334 1.0034 
Spelling ; 8.6118144 1.5164 
Vocabulary 11,8684108 1.1970 
General Science — 0.0296304 0.0195 
BO OURS Tr ieee cbeithiorttennems 7.9656734 4.9317 

















Norte: These constants have been multiplied by 1000. 


ANALYSIS OF VARIANCE 





Source of variation df Sumofsquare Mean square 





Total 23.431579 
Variance due to Regression 7 8.964801 1.280686 
87 14.466778 0.1662848 








F= 7.70176 
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For each test the weights which maximize the difference be- 
tween these two groups with tests of their individual signifi- 
cance and the analysis of variance for the discrimination are 
shown in Table 8. 

When each constant is multiplied by the appropriate test 
score for each student in the two groups, the resulting distri- 
butions show the maximum difference between the two groups 
on the basis of these seven measures. These Cistributions are 
shown in figure 8. Here students who received a degree 
summa cum laude are designated by a different symbol from 
those receiving a degree magna cum laude. Those rejected 
without examination are also designated differently from those 
who were examined but failed by the Honors Committee. 

The value of these variables in separating honors students 
from those rejected or failed can be seen by a comparison of 
these distributions. If a vertical line is drawn between the 
tenth and eleventh intervals, thirty-nine of the fifty-three stu- 
dents granted honors are included above that line, while only 
five of the forty-two failed exceed that level of performance. 
It is seen that students receiving degrees summa cum laude 
stand well up among those granted higher honors. Those re- 
jected without examination represent the lowest level of per- 
formance. The standard error of the difference between the 
means of these two distributions and the analysis of variance 
reported in Table 8 indicate a substantial difference between 
the groups. The individual tests of significance for the seven 
weights fail to indicate a statistically significant t value of any 
of the tests when considered separately. The combined effect 
of the three parts of the English test gives evidence of increas- 
ing the discrimination, however, since a further comparison 
on the basis of honor point ratio alone reveals significantly 
more overlapping between the two groups. 

In order to observe the performance of students who were 
granted a degree cwm laude but who did not petition for 
higher honors the weights already calculated were multiplied 
by the appropriate test scores for each individual in this 
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group. The resulting distribution of combined scores reveals 
that while most of the scores of the non-applicants cluster at 
the lower end of the scale, several exhibit performance on the 
tests and honor point ratio which would make them ‘‘good 
risks’’ as candidates for higher honors. Eighteen students 
were found to have combined scores which-would place them 
above the eleventh interval on the preceding figure. Since 
only five students out of forty-one who exceeded this level 
of performance were failed by the Honors Committee, we may 
conclude that these eighteen students represent a high level of 
performance. Probably the most important difference between 


Figure 8 
A Comparison on the Basis of a Combined Score, from the Sophomore 
Battery and Honor Point Ratio, between Higher Honors 
Students and Those Rejected and Failed by 
the Honors Committee 
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these students and those granted degrees with higher honors 
is their failure to make application for the examination. 


SUMMARY 


The results presented in these three papers are in no way a 
complete evaluation of the sophomore testing program at the 
University of Minnesota. The quantitative approach under- 
taken, however, indicates the value of these tests for the solu- 
tion of some practical problems with which counselors and 
advisers are faced. The tests give evidence of value in classi- 
fying students according to major group and show significant 
relationship to academic success in various fields of work. 
Further, it has been shown that test scores together with 
measures of scholastic success might be of considerable value 
in providing objective criteria for selecting students to be 
granted degrees with higher honors. 

The applicability of the discriminant function as a tech- 
nique for the analysis of profiles has been illustrated. It is 
hoped that these illustrations will suggest further applica- 
tions to problems of discrimination on the basis of multiple 
measures. 
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AN EVALUATION OF THE TYLER-KIMBER 
STUDY SKILLS TEST 


L. D. HARTSON, HARRY W. JOHNSON, II, anp 
M. ELIZABETH MANSON 


Oberlin College 


DESCRIPTION OF THE PROJECT 


N 1938 the Tyler-Kimber Study Skills Test (6) was included 
if in the battery administered to the freshmen in the College 
of Arts and Sciences, Oberlin College, and a study of the 
results has been made to determine its relative prognostic 
value. An item analysis was also made to determine which 
types of items in the Tyler-Kimber Test provide the most valid 
basis for predicting first semester scholarship. The method 
employed is described by Toops (4). In this analysis Parts I 
and II were omitted because the questions in these sections 
were too easy. To check these findings the same procedure was 
followed with the class which entered in 1939. 

It is the practice at Oberlin to test all entering students. 
Those of Oriental origin and recent immigrants, who are ob- 
viously laboring under a language handicap, have, however, 
been excluded from this study. The mean test intelligence 
level is rather high, as is indicated by the fact that these two 
classes ranked second and first, respectively, in the national 
list of colleges using the American Council on Education Psy- 
chological Examination. The main point of this investigation, 
then, is to determine the value of the Tyler-Kimber Test when 
used with a highly selected freshman population. 


RESULTS 


1. Validities and Intercorrelations. 


Using the first semester grades as the scholarship criterion, 
and separating the men and women, correlations have been 
594 
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run with the entire battery of tests administered to the fresh- 
men as indicated in Table 1. These include the Ohio State 
University Psychological Examination (OSU Test) and that 
issued by the American Council on Education (ACE Test), 
the Tyler-Kimber Test (T-K, Test), and an abbreviated form 
of the Boyington Study Skills Test (Boyington Test) with 
the class of 1943. Validity coefficients were also obtained for 
High School Scholarship (H. 8. Schol.) and Scholastic Esti- 
mates (Schol. Est.)—made on a rating scale of the students 
at time of application for admission. Table 1 reports not only 
these validity coefficients but those for two abbreviated forms 
of the test (T-K, and T-K;) and the intercorrelations between 
the variables, with means and standard deviations. 

When the coefficients for the four groups of subjects are 
averaged, the rank order in the validity of the variables, when 
the four coefficients for each variable are averaged, is as fol- 
lows: (1) H. 8. Scholarship (.583); (2) OSU Test (.552); 
(3) ACE Test (.466); (4) T-K,, 76 items (.462); (5) Scho- 
lastic Estimates (.458); (6) T-Ks, 49 items (.453); (7) T-K 
Test (.434); (8) Boyington Test (.276). The OSU Test has 
usually proved to be a better basis for predicting college 
scholarship at Oberlin than are the High School grades, as was 
reported in Buros’ 1938 Mental Measurements Yearbook (1). 

Table 2 records the more significant multiple correlation 
coefficients. As the validity coefficients obtained for the men 
are all higher than those for the women, only the multiples 
for the men have been computed. The first order multiples 
range between .515 and .713. The best results were obtained 
by the combination of the OSU Test with H. 8. Schol. (an aver- 
age of .706 for the two classes). In this case, as with six other 
classes in which comparison is possible, the OSU Test has given 
higher validity coefficients than have been obtained from the 
ACE Test. A combination of the 49 item T-K Test with the 
OSU Test gives an average coefficient of .623; the other forms 
of the T-K Test give somewhat lower figures. The addition of 
H. 8. Schol. to OSU Test scores and some form of the T-K 
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TABLE 2 


Multiple Correlation Coefficients for Different Combination of the Vari- 
ables with First Semester Scholarship 





Class of 
Multiples 


1942 1943 


1.25 (T-K,, OSU Test) ... 617 599 
1.26 (T-K,, ACE Test) . 086 .516 
1.27 (T-K,, H. 8. Schol.) 658  .677 
1.28 (T-K,, Schol. Est.) . 632 538 
1.35 (T-K,, OSU Test) 642 .597 
1.36 (T-K,, ACE Test) 620 .518 
1.37 (T-K,, H. 8. Schol.) 684 .673 
1.38 (T-K,, Schol. Est.) 662 .537 
1.45 (T-K,, OSU Test) 650 .596 
1.46 (T-K,, ACE Test) 631 .515 
1.47 (T-K,, H. 8. Schol.) 689 .672 
1.48 (T-K,, Schol. Est.) 674 .533 
1.57 (OSU Test, H. 8. Schol.) 698 .713 
1.58 (OSU Test, Schol. Est.) 666 .633 
1.67 (ACE Test, H. 8. Schol.) 663 .710 
1.68 (ACE Test, Schol. Est) ww JOA ee 
1.78 (H. 8. Schol., Schol. Est.) 607 .663 
1.257 (T-K,, OSU Test, H. 8. Schol.) 707 ~=—.718 
1.258 (T-K,, OSU Test, Schol. Est.) 684 .633 
1.357 (T-K,, OSU Test, H. 8S. Schol.) 719: 714 
1.457 (T-K,, OSU Test, H. 8. Schol.) 723 ~=.714 
1.478 (T-K,, H. 8. Schol., Schol. Est.) -700 =.674 
1.578 (OSU Test, H. 8. Schol., Schol. Est.) 704 =.715 
1.2578 (T-K,, OSU Test, H. 8. Schol., Schol. Est.) .713 .715 
1.4578 (T-K,, OSU Test, H. 8. Schol., Schol. Est.) .729 .716 









































T-K,= Total Test; T-K,=76 items; T-K,=49 items. 


Test raises the coefficient to figures ranging between .707 and 
.723. In this combination the highest figure (.723) was ob- 
tained by using the 49 item T-K Test. When the T-K Test is 
combined with H. 8. Schol. and Schol. Est., it gives somewhat 
lower coefficients than does the OS’ Test, where the best 
figures for the combination with the T-K Test average .687, 
and the average for the combination with the OSU Test is .710. 
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Finally, the addition of Schol. Est. to H. 8. Schol., OSU Test, 
and the T-K, Test raises the figure to an average of .723. 

When an inverted pyramid is formed of the variables in 
terms of their validities, we find that with this particular set 
of data, the base is formed of H. S. Schol., for which the 
average coefficient for the two years is .627. The addition of 
the OSU Test raises the figure to an average of .706. The next 
most productive variable is the T-K; Test, which raises the 
figure to .719. Finally, the addition of Schol. Est. boosts the 
figure to .723. 

In an earlier study (2), the senior author reported a coeffi- 
cient of .710 obtained by using the OSU Test, H. S. Schol. and 
Estimates obtained from principals and teachers on intelli- 
gence, industry, and attitude toward school work. The average 
obtained for the two classes here reported is again .710, thus 
corroborating the earlier finding. We now find that the addi- 
tion of the T-K Test supplements the other variables by .013 
points (making .723) when we use the average figures, and 
.025 with the men of 1942 (making the multiple .729). 


2. Comparative Diagnostic Value of the Tyler-Kimber Test 
and an Intelligence Test. 


Oberlin College requires a grade average of ‘‘C’’ as a mini- 
mum for graduation. It is therefore convenient to use this 
critical level as a basis for comparing the relative effectiveness 
of the T-K Test and the better of the two intelligence tests, #.e., 
the OSU Test. At the end of the first semester there were 112 
men and 65 women in the two classes studied whose grade 
average was below C. If one uses the lower third of the test 
distribution as a basis of apprehension as to a student’s success, 
it eventuates that 27 of these 177 low-ranking students made 
low T-K Test scores and 50 of them made low OSU Test scores. 
Neither test is particularly successful in identifying the poorer 
students, but: the OSU Test does nearly twice as well as the T-K 
Test. Comparison may also be made between the coefficients 
obtained at the Sacramento Junior College, where the T-K 


Sica 
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Test was validated, and those obtained at Oberlin. The au- 
thors report that the partial correlation coefficient between the 
T-K Test and College Scholarship, when the ACE Test scores 
are held constant, is .262 (7, p. 7). The comparable figures 
for the men of the two classes at Oberlin, using the ACE Test, 
are .256 and .152. The correlation between the T-K Test and 
scholarship with the OSU Test held constant, for the two 
Oberlin classes, is .242 and .083. As to the validity of the 
intelligence tests: at Sacramento, the correlation between the 
ACE Test and Scholarship, with the T-K scores constant is 
.286. The comparable figures for the two groups of Oberlin 
men are: (1) using the ACE Test, .328 and .238, and (2) using 
the OSU Test, .398 and .474. 


3. Results of the Item Analysis. 


Although the T-K Test as a whole contributes but little to 
the validity figure obtained by a combination of the other avail- 
able variables, the fact that it has a validity represented by 
figures as high as .500 indicates that it contains some valuable 
items. An analysis was therefore made to discover these items. 
Examination of the tabulated scores for the different parts 
revealed the fact that certain sections of the test were not suffi- 
ciently differential for the Oberlin students. Thus, as may be 
noted from Table 3, so large a percentage of the students made 
perfect scores on parts I and II as to render them useless. In 
Test I (Book Organization), 70 per cent of the students made 
perfect scores, and 23 per cent made only one or two mistakes. 
In Test II (Index), 64 per cent made perfect scores, and 43 
per cent made but one or two mistakes. In Test IV (Abbrevia- 
tions), 35 per cent of the scores were perfect, and 39 per cent 
more made but one, two, or three mistakes. This part also is 
too easy for the population tested. 

Eliminating Parts I and II, an item analysis was made of the 
remainder of the test, for each sex and each year separately. 
From these four sets of coefficients two selections were made 
as the composition of shorter tests. The first consists of the 76 
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TABLE 3 
Quartile Ranges in Scores for the Combined Classes, 1942 and 1948 





Low Third Second Top 
quarter quarter quarter quarter 








I. Book organization ............. 4-11 12 12 12 

II. Index 3-9 10 10 10 
III. Use of references ................. 10-21 22-23 24-26 27-30 

VI. Abbreviations ...................... 8-15 16-17 18-20 20 
V.: Gand eatalog ............... 6-13 14-15 16 17-20 
VI. Map interpretation .......... 4-12 13-14 15 16-20 
VII. Current periodicals ......... 5-19 20-22 23 24-30 
VIII. Graph interpretation ....... 3-21 22-24 25-26 27-35 
TOtAl oocccccnennnnnne 97-181 182-140 141-146 147-169 





items for which there were no negative coefficients and at least 
two of a value as high as 0.10. The second consists of the 49 
items in which a coefficient of 0.10 or more was found in at least 
three of the lists. On the average, the selection of 76 items 
gives the highest validity, the figure being .462, but the selec- 
tion of 49 items also gives a higher validity (.453) than does 
the test as a whole, which averages .434. 

It proved profitable to examine the character of the valid 
items. Part VIII (Interpreting Graphs), provides the largest 
proportion of them, 16 of its 35 items. This may be due, in 
part, to the fact that Part VII is the last test in the series; 
but, only in part, for the more valid questions are rather inti- 
mately associated with particular graphs, and the most valid 
ones with the fourth in the series of five. None of the four 
questions based on a ‘‘pie diagram’’ was sufficiently valid to 
be included, and only one from each of the sets of questions 
based on the next two figures (line graphs). But the third 
assignment requires comparison of data from two graphs, one 
based on gross data, the other upon proportions. Eight of 
these ten questions have consistently significant validities. The 
questions based on the final diagram again call for comparisons 
and interpretations, and here six of the nine questions obtained 
significant validity coefficients. 
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Part III (Using General Reference Books), furnished 12 
valid items from its 20 questions. In this test the student is 
asked to select, from a list of nine, the reference work in which, 
e.g., ‘‘to find what has recently been written in magazines on 
the subject of evolution,’’ and ‘‘about New Deal legislation,’’ 
to cite the two most valid questions. 

Seven of the 20 questions in Part VI (Interpreting Maps), 
satisfied the criterion for validity. The correct answers to five 
of these questions depended upon the student’s previous knowl- 
edge of the location of four geographical areas in Asia. Nine 
of the 30 questions in Part VII (Knowing Current Periodical 
Literature), meet the criterion for validity. The two most 
valid items call for identification, from a list of ten, of the 
‘‘monthly which presents in condensed form articles from 
magazines of the preceding months,’’ and ‘‘a magazine noted 
for its high literary quality.’’ 

In Part IV (Recognizing Common Abbreviations), the most 
valid items are: op. cit., e.g.,and g.v. Five others had validity 
coefficients as large as 0.10 in two of the four lists (1.e., ibid., 
supra, circ., and infra). Data will be reported below which 
indicate that there are predictive possibilities in this type 
of test. 

Part V (Using the Library Card Catalog), provided only two 
valid items. They are: ‘‘In the library card catalog, one may 
usually find (1) Books listed by subject, and (2) Cross refer- 
ences from one subject to another.’’ 

In sum, we find that Parts IV and V (Abbreviations, and 
Use of Card Catalog), made virtually no contribution to total 
validity, but that 30 per cent of the items in VII (Current 
Periodical Literature), 35 per cent of the questions in VI 
(Interpreting Maps), 40 per cent of those in III (Using Refer- 
ence Books), and 46 per cent of VIII (Interpreting Graphs), 
contributed consistently to the total validity of the test as 
a whole. 


SUPPLEMENTARY STUDY OF THE ABBREVIATIONS TEST 


In the autumn of 1926 a test consisting of abbreviations and 
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foreign phrases had been used at Oberlin as part of a battery 
administered to freshmen and had been found to have fair 
validity. In fact, it gave the best figures of any of the nine 
parts of the Ohio College Association Study Performance 
Tests, edited by H. A. Toops (5). With populations of 173 
men and 167 women, coefficients of .504 and .512 were obtained, 
using the first semester scholarship as the criterion (3). This 
test had been prepared by T. A. Langlie. It consists of 70 
items to be checked by matching against the appropriate 
answer. An item analysis was made to determine the more 
valid items, and the correlation was computed between the 
scholarship criterion and the scores made by the men on the 35 
items which yielded the highest validities. The figure is .511 
(as compared with .504 for the 70 item test). A new test was 
then constructed, consisting of 16 abbreviations and 23 foreign 
phrases, 35 from Langlie’s list and four from the Tyler-Kimber 
list. For the purpose of obtaining answers to be used in a 
multiple choice form, the test was administered to classes in 
elementary psychology in recall, or free association, form. 
From these answers a quadruple choice form of the test was 
constructed. This was then administered (1) to the students 
in psychology and (2) to 250 freshmen. In the free assdciation 
form the validity coefficient obtained for the students in psy- 
chology (151 sophomores, juniors and seniors) is .304; in 
multiple choice form (N is 139) it is .387. With the freshmen 
the multiple choice form yielded a coefficient of .423. 

The question which immediately comes to mind, however, 
is how this test of foreign phrases and abbreviations compares 
in validity with any other vocabulary test. A composite vo- 
cabulary test, consisting of Parts 1 and 2 of Form 20 of the 
OSU Test, and of Parts 1, 3, and 5 of the 1939 edition of the 
ACE Test, was administered. For these same 250 freshmen 
the validity coefficient for this vocabulary test is 524. As 
the intercorrelation between the scores on this vocabulary test 
and the test of foreign phrases and abbreviations is .638, it is 
apparent that the abilities measured by the two tests overlap 
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to considerable extent. In fact, the multiple R obtained by 
using both tests raises the coefficient by a mere .012 (or from 
.524 to .536). It may be concluded, therefore, that the better 
the abbreviations test the nearer it approximates a good 
vocabulary test. 


SUMMARY 


(a) The comparative validity of the Tyler-Kimber Study 
Skills Test was determined for two freshmen classes, an item 
analysis singled out the most significant questions in this test, 
and a new test of foreign phrases and abbreviations was 
validated. 

(b) The rank order of the validity coefficients is as follows: 
(1) H. 8. Schol., (2) OSU Test, (3) ACE Test, (4) T-K Test, 
(76 items), (5) Schol. Est., (6) T-K Test, (49 items), (7) T-K 
Test, (Total), (8) Boyington Test. 

(ec) The study confirmed an earlier evaluation of rating 
scales used with candidates for admission to college (2). 

(d) A coefficient of .729 is obtained for the men of one class 
(and an average of .723 for two classes) for the multiple com- 
posed of H. 8. Schol., OSU Test, the 49 item T-K Test, and 
Schol. Est. 

(e) As a prognostic instrument for discovering students 
likely to have scholastic difficulty, the T-K Test proved to be 
about half as valuable as the OSU Test. With the OSU Test 
held constant the partial coefficients of correlation between the 
T-K Test and first semester scholarship for the men of the two 
classes are .242 and .083. 

(f) Parts I and II of the T-K Test are not sufficiently diff- 
cult for the subjects of this study. 

(g) The most valid items in the T-K Test are those (1st) 
requiring a comparison of data obtained from two different 
graphs, or (2nd) the interpretation of data presented in graph- 
ical form ; (3rd) those whose answer depends upon information 
concerning geographical or bibliographical facts, and (4th) 
knowledge of the meaning of certain abbreviations. 
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CONCLUSIONS 


When used with freshmen as highly selected as the Oberlin 
students, the Tyler-Kimber Test does not prove to be a dif- 
ferential measure of their study skills. Those portions of the 
battery which, according to title, bear the closest a priori 
relationship to ‘‘study skill’’—Book Organization, Use of the 
Index, References and the Card Catalog—have little validity. 
On the other hand, the portions of the test which do show some 
relationship to scholastic aptitude are those demanding a 
comparison between, or interpretation of, materials obtained 
from graphs, or call for specific sorts of geographical or lexi- 
graphical information. The addition of graphical material to 
the ‘‘reading comprehension’’ section of a psychological ex- 
amination should accomplish all the results claimed for the 
Study Skills Test. 
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THE VALIDITY OF SELF-ESTIMATED 
INTERESTS* 


D. J. MOFFIE 
North Carolina State College 


SYCHOLOGISTS, counselors, personnel workers, and 
Pp educators have recognized the fact that one require- 
ment necessary for success in any given occupation is 
interest in that occupation. Strong’ states: ‘‘It is assumed 
that, if a man likes to do the things which men like who are 
successful in a given occupation and dislikes to do the things 
which these same men dislike to do, he will feel at home in that 
occupational environment. Seemingly, also, he should be more 
effective there than somewhere else because he would be en- 
gaged, in the main, in the work he liked.’’ This fact has also 
been noticed by the number of articles published within the 
last two years on interests. The vocational guidance coun- 
selor, therefore, often asks his patient the direct question, 
‘*Are you interested in law, medicine, engineering, teaching, 
or some other specific occupation ?”’ 

The purpose of this investigation was to determine the valid- 
ity of the above question. Specifically, what is the relation- 
ship between self-estimated interests and those interests as 
measured by the Strong Interest Blank? The study grew out 
of the fact that, in counseling, a close relationship appeared to 
exist between self-analysis and measured interests. This con- 
clusion, however, was drawn arbitrarily in clinical work with 
college students. 

In a survey of past literature we find that there are two 


* Read at the annual meeting of the Southern Scciety for Philosophy 
and Psychology at Nashville, Tennessee, in April, 1942. 

1E. K. Strong, Jr., Manual for Vocational Interest Blank for Men. 
Stanford University, California, Stanford University Press, 1940. 
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studies closely related to the present investigation; the first 
by Bedell and the second by Crosby and Winsor. Bedell,? in 
answer to the question of relationship between self-analysis 
and measured interests, administered the Strong Interest 
Blank to 141 freshman women in the Teachers College of the 
University of Nebraska. A questionnaire somewhat similar 
to the one used in this study was given to each of the 141 girls. 
Pearson coefficients were determined between the self-analysis 
score and the score obtained from the Strong. Bedell found 
that thirteen of the coefficients exceeded .22, although none of 
these thirteen was high. Only the coefficients for stenogra- 
pher-secretary and teacher of social sciences exceeded .50. In 
addition, relationships were determined between estimated 
scores and other occupations. He concludes, ‘‘Scores for all 
occupations except two were found to offer no better basis for 
the prediction of self-estimates for the given occupation than 
for some other occupation. No close relationships between 


self-estimated and measured vocational interests were found.’’ / 


Crosby and Winsor,’ in answer to the same problem studied 
by Bedell, administered the Kuder Preference Test to 222 stu- 
dents in the colleges of Agriculture and Home Economics at 
Cornell University. There were 127 men and 95 women in the 
group studied. Each student was asked to estimate his posi- 
tion, in relation to the general college population, on the seven 
main types of interests as measured by the Kuder. Then, the 
students were asked to fill out the Preference record. An 
average r of 0.54 was found between the estimated and tested 
interests of the students in this study. The authors conclude 
that, ‘‘This indicates that interest inventories may probably 
be used quite profitably to supplement the student’s own 
opinion in his interests.’’ It is to be noticed that there is a 
discrepancy between the results of the first and second study. 
Perhaps this difference may be due to the type of test used. 


? Ralph Bedell, ‘‘The Relationship between Self-Estimated and Mea- 
sured Vocational Interests,’’ J. Appl. Psychol., 1941, 25, 59-67. 

°R. ©. Crosby and A. L. Winsor, ‘‘The Validity of Students’ Estimates 
of Their Interests,’’ J. Appl. Psychol., 1941, 25, 408-415. 








608 D. J. MOFFIE 


The subjects of this study consisted of eighty N.Y.A. stu- 
dents taken at random from the N.Y.A. center located at 
Raleigh, North Carolina. The mean age of the group was 
18.7. The range was eight years. The students studied were 
enrolled in one of the three curricula—radio communication, 
wood shop, or machine shop. 

A questionnaire, on which were given the main groups and 
occupations, as reported by the Strong Interest Blank, was 
administered to these students. Each student was asked to 
rate his interests for the groups and specific occupations. The 
Strong blank was then administered immediately after check- 
ing their questionnaire. The relationship between these two 
was not pointed out to the student. 

The form used for checking the estimates is shown in Figure 
1. A millimeter seale was used in scoring each response. The 
standard scores on the Strong test were used. Pearson coeffi- 
cients were then obtained between estimated ratings and the 
scores on the Strong test. 

These coefficients, with their probable errors, are reported in 
Table 1. Correlations were obtained for the six main groups 
and for twenty other professions. The other occupations re- 
ported were selected because these were the only keys avail- 
able to us at the time of the study. However, it appears rea- 
sonable to believe that comparable results would be obtained 
even if these had been selected at random. 

Coefficients for the six groups ranged from —.07 to +.47. 
The +.47 was obtained for the group composed of banker, 
office man, accountant, and purchasing agent. The coefficient 
of alienation indicates that this is only twelve per cent better 
than pure chance, so that when viewed by itself its prediction 
is quite low. 

Relationship indices ranged from — .05 to + .54 for the other 
twenty occupations. The two highest were + .54 for musician 
and +.53 for personnel manager. These indicate approxi- 
mately sixteen per cent better than pure chance. 


4 The Strong scores were obtained from a larger group of data gathered 
in cooperation with Dr. William MeGehee. 
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Fic. 1. Sample section of questionnaire. 


DEPARTMENT OF PSYCHOLOGY 


North Carolina State Coliege 
Vocational Interest Inquiry 








Name .... Age 
Curriculum Class 








The purpose of this inquiry is to see how well you like certain occupa- 
tions. You are asked to indicate after each occupation or group of occu- 
pations listed below and on the next sheet the extent to which you would 
be interested in that kind of work. Do not consider salary, social recogni- 
tion, advancement, skill, ability, or training. Merely indicate how you 
would enjoy that kind of work. 

Make a straight, vertical mark on the line opposite each occupation or 
group of occupations at the position which best describes your interests. 
It need not necessarily be under a descriptive phrase. 


Above 
No Little Average Average Extreme 
Interest Interest Interest Interest Interest 
| 
' ! ' i 





Example: 
Dentist: | | | | a 





It is possible to make a rough classification of different occupations in- 
volving like interests. These have been listed below. Rate yourself on 
each group. Work rapidly. Your first impression is desired. 


Above 
No Little Average Average Extreme 
Interest Interest Interest Interest Interest 


' ' | ' ! 





Group: 
I. Applied Sciences 
Doctor, dentist, 
psychologist, 
architect, artist 





Rate yourself on each occupation given below. Work rapidly. Your 
first impression is desired. 


Above 
No Little Average Average Extreme 
Interest Interest Interest Interest Interest 





Artist: 
| | | | 
' | ! 





Psychologist : : : 
' ' ' ! t 





| 
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Table 2, which shows the distribution of measured and esti- 
mated scores, points out the fact that the group was quite 
heterogeneous. At least a good discrimination of scores ap- 
pears to be present. No consistency is apparent between mea- 
sured and estimated scores from an analysis of these tables. 
An inspection of the measured group scores shows that the 


TABLE 1 


Coefficients of Correlation with Probable Errors Between Self-Estimated 
Interests and Interests as Measured by the Strong Vocational 
Interest Blank for Eighty N.Y.A. Boys 





Coefficients Coefficients 
with with 
probable probable 
errors errors 


Groups Occupations 





- Doctor, Dentist, Forest Service -26 + .07 
Psychologist, Architect 06 + .07 


Architect, Artist .. .20+.07 ip 
Mathematician .22 + .07 


. Chemist, Engineer, Production Mgr. ........... 24+. 


Mathematician ...... .36+ .06 35+. 


. Personnel manager, 13+. 
YMCA secretary, Sayer yg 
Physical director, Personnel Mgr. .............. 53 +. 
Minister, Social Musician 
science teacher, Certified Public 


City school super- Accountant 
intendent 


. Accountant, Pur- Sales Manager 
chasing agent, 
Banker, Office man .47+. Real Est. Salesman 

Life Ins. Salesman ...... 

. Sales Manager, 
Real Estate Sales- 


man, Life Ins. Author-Journalist 


Pres. Mfg. Concern 
Physician .., aiciiess 
. Advertising man, Math. Science 
Author-journalist, Teaeke? 34.4... 
YMOA Secretary 
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5+ .07 
5+.07 


2 + .07 
t+ .07 


5 + .06 
3+.07 
3+ .07 
3 + .06 


L + .06 


5+ .07 


)+.07 
5+ .07 


| + .07 
5+ .07 


)+ .07 
5+ .07 
1+ .07 
$+ .07 
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discrimination appears to be better than for other occupations. 
In both estimated and measured interests there is a larger 
number of C scores than A scores. The greater number of 
estimated A scores occurs for mathematician, engineer, and 


TABLE 2 
Table of Distribution for Measured and Estimated Scores 


















































Measured Self-estimated 
Occupations 

‘A B Cc A B Cc 
Architect 1 19 60 4 41 35 
Mathematician ..c.cccccccccscccum 5 17 58 11 39 30 
Engineer 7 52 21 18 54 8 
Production Manager ............ 20 48 12 0 52 28 
Farmer 35 42 3 4 38 38 
Carpenter 27 44 9 Q 47 29 
Forest Service .ccccccccccsecsesssessee s 27 45 2 49 29 
Personnel 5 18 57 2 45 33 
Musician 7 42 31 8 46 26 
C. P. Accountant 0. .c.cecccccececee 0 10 70 2 38 40 
Banker ... 4 57 19 3 44 33 
Sales Manager ....cccccuun 0 41 39 a 47 29 
Real Estate Salesman. ......... 10 51 19 3 30 47 
Life Ins. Salesman .................. 3 41 36 3 33 44 
Lawyer 1 24 55 3 35 42 
Author-Journalist 2.0.0.0... 2 28 50 8 36 36 
Pres. Mfg. Concern ............... 12 40 28 8 42 30 
Physician 3 40 37 2 39 39 
Math.-Science Teacher ..... 18 35 27 5 33 42 
YMCA Gen. Secretary ........ 1 17 62 2 34 44 

Measured Self-estimated 
Groups 

A B Cc A B C 
Group PE Se SS» | 48 21 2 55 23 
Group II 9 51 20 13 57 10 
Group V 18 383 2 Lo 
Group VIII 17 50 13 8 50 22 
Group IX 8 es 
Group <X 6 50 2% 5 4 29 
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group II (composed of mathematician, engineer and chemist). 
In the measured interests, however, the largest number occurs 
for production manager, farmer, and carpenter. If the Strong 
is a true measure of interest, then it may be stated that these 
students have aspiration levels beyond those of their true 
interests. 

It might be stated that having likes and dislikes typical of 
successful people in specific occupations is quite different from 
an interest in the type of work performed in that occupation. 
This, however, does not seem likely. To have likes and dis- 
likes typical or atypical must in reality mean that your inter- 
ests are the same or different. Furthermore, it is quite con- 
ceivable that those people successful in certain lines of work 
must be interested in activities required by that work. This, 
however, would be arguing over a question which is still un- 
answered. The lack of consistency between estimated and 
measured interests is most likely due to a lack of maturity and 
experience on the part of the student. Interests develop be- 
cause of maturation and experience in certain lines of work. 
Without these experiences, the significance of general items 
will not be fully recognized. An inspection of the items in 
the Strong bears out this point. _Gross activities are listed and 
without previous experiences no reliable response can be made 
to theit—This"fact would tend, thei; to reduce the validity 
of ‘the measured scores. It seems reasonable to believe that 
lack of maturity would invalidate such an instrument as the 
Strong much less than a self-estimate because of the very 
nature of the test itself. A trend or constellation of interests 
may be picked up by a test, whereas it may go unnoticed in 
a self-analysis. Strong has stated that his test should not be 
used below age 17. In fact he has indicated that it should be 
used with a lot of precaution below age 25. 

With the same hypothesis in mind, we may use this argu- 
ment in the explanation of differences between the present 
study and the one conducted by Crosby and Winsor. The 
Kuder test which was used in the Crosby and Winsor study re- 
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quires answers to questions which are more specific in nature 
than those in the Strong. The items in the Kuder test have 
been developed with the thought in mind of providing familiar 
experiences to which the student is to react. We could expect, 
then, more reliable responses, and, in the long run, higher 
relationships between measured and estimated scores. The 
following conclusions may be drawn: 

1. Only one group—that composed of banker, office man, 
accountant and purchasing agent and two specific occupa- 
tions, personnel manager and musician—showed high enough 
relationships with estimated scores to be significant. 

2. On the basis of this study, at least for the age group in- 
vestigated, definite precaution should be used by the coun- 
selor or personnel worker in determining the values of an 
estimated interest. 

3. It appears to the writer that the measured score would be 
more indicative of actual trends in interests but that an esti- 
mated score should not be completely disregarded. 

4. This study, in addition to answering its original purpose, 
suggests the fact that greater precaution should be observed 
in the type of interest test selected and the year level for which 
it is to be used. 





A MULTIPHASIC PERSONALITY SCHEDULE 
(MINNESOTA): IV. PSYCHASTHENIA* 


J. C. McKINLEY, M.D., ann 8. R. HATHAWAY 


Department of Psychology and the Division of Nervous and 
Mental Diseases, University of Minnesota 


N earlier paper of this series (1) described the Multi- 
phasic Personality Schedule and its projected lines of 
development. The Multiphasic Schedule differs from 

traditional personality instruments in the deliberate attempt 
to include among the items as many as possible of those that 
might give information of importance clinically, without re- 
gard to the particular phase of personality upon which the 
item may bear. This initial concept explains the name given 
to the inventory. The experimental schedule includes 504 
items which are printed upon separate cards. The subject 
responds to each card by filing it behind one of three index 
guide cards on which are printed ‘‘True,’’ ‘‘False,’’ or ‘‘Can- 
not Say.’’ 

The experimental items have now been administered to 
about 3000 individuals of various normal and abnormal classi- 
fications. The chief normal group against which all hos- 
pitalized abnormal groups are considered is comprised of 
adults to whom the items were administered when they came 
as visitors or brought patients to the University Hospital. 
The only requirement for inclusion of these persons as normal 
was that they said they were not under a doctor’s care at the 
time of testing. The word ‘‘normal’’ as used herein never 
implies more than this. The normal group so obtained repre- 
sents a reasonably accurate cross section of the Minnesota 


* Prepared on Work Projects Administration Official Project No. 
165—1-71-124, Sub-Project No. 379. Supported in part by a research 
grant from the Graduate School of the University of Minnesota. 
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population with some over-emphasis of the rural population. 
The modal scholastic achievement is eighth grade and occupa- 
tional ratings indicate that the modal occupational level is 
approximately that of the general adult population. 

Two scales have now been derived and published. The first 
of these was a scale for the measurement of hypocondriasis and 
the second one for the measurement of symptomatic depres- 
sion. The present paper treats with the measurement of 
psychasthenia.* 


A. DERIVATION OF THE PSYCHASTHENIA SCALE 


The psychiatric classification of psychasthenia is applied to 
a group of individuals whose thinking is characterized by 
excessive doubt, by compulsions, obsessions, and unreasonable 
fears; these persons are often seen in psychiatric hospitals but 
are encountered much more frequently among normal groups 
by counselors and personnel workers. Certain phobias such 
as the fear of spiders, of snakes or of windstorms are wide- 
spread among the population, but similar phobias become so 
strong and so numerous in some persons as to afford a source 
of considerable maladjustment vocationally, socially or other- 
wise. Often a psychasthenic individual is characterized not 
so much by well-marked fears of individual things or acts as 
by great doubts as to the meaning of his reactions in what 
seems to be a hostile environment. In other cases the phobia 
becomes attached to certain acts or thoughts of the subject 
in such a way that he is forced through fear to compulsively 
perform needless, disturbing or personally destructive acts or 
to dwell obsessively upon lines of thought which have no sig- 
nificance for his normal activities. 

Compulsive acts are always characterized by the need felt 
by the subject to perform them without regard to rational 
considerations. For example, he may always be forced to 
count objects or to touch a certain spot on a wall or to avoid 


1 The Minnesota Multiphasie Personality Schedule is now available for 
purchase from The University Press, Minneapolis, Minnesota. 


a 
4 t, 





P 
§ 
* < 
4 
ie 
b4 


a a taitalcnn 





616 J. ©. MCKINLEY AND 8S. R. HATHAWAY 


stepping on sidewalk cracks. If he fails to do these things 
he feels uncomfortable ; if he does them he is forced to rational- 
ize and justify his acts. Obsessive thinking is itself commonly 
accompanied by anxiety so that the patient may be tense and 
anxious over the content of his thoughts as when he thinks 
over and over again that he is useless. Similarly, he may find 
himself anxiously obsessed with such ideas as the impending 
likelihood that he will faint or that something terrible or 
threatening is about to happen. Again, he may be forced to 
think things which, while not in themselves producing anxiety, 
through his impatience and preoccupation with the fact that 
he cannot stop thinking them, do secondarily produce an 
anxious reaction; for example, compulsive counting itself has 
little attached anxiety since the patient is merely forced to 
count everything that he sees, but he may worry so much over 
his inability to stop counting as to have anxiety as a large com- 
ponent in his thinking. The general reaction type character- 
ized by these compulsive and obsessive acts and thoughts is 
called psychasthenia. The word derives from the concept of 
a weakened will that cannot resist the behavior regardless of 
its maladaptive character. 

The development of a scale for the measurement of the 
general symptomatic traits which are classed under the psy- 
chiatric designation of psychasthenia, has demonstrated that 
there is an identifiable personality pattern underlying the 
varying symptomatic picture from case to case. Many of the 
items making up the psychasthenia scale are clearly much more 
general than the specific compulsions or phobias and apply to 
a more general personaltity make-up of which the subject is 
usually entirely unaware. 

The methods of derivation for the present scale differ only 
in detail from the methods used in the scales reported earlier 
(2,3). Unfortunately for the present study not many entirely 
satisfactory criterion cases of psychasthenia come into the 
closed wards of a psychiatric clinic. Many more are seen in 
the outpatient clinic or are advised by lay counselors and are 
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never severely handicapped. Because we have felt unsure in 
the use of even carefully studied inpatients for purposes of 
seale derivation, we have avoided using criterion cases from 
outpatient clinic. The criterion group is thus small and not 
entirely homogeneous. At least one of the cases appears to 
have been incorrectly diagnosed. Fortunately, the trait itself 
is the most homogeneous one so far described so that correla- 
tions of items with total score could be used asa guide. Other- 
wise, we would hesitate to publish the results with so few 
criterion cases. 

The chief subjects for seale derivation consisted of (a) 139 
normal married males between the ages of 26 and 43, and 200 
normal married females between the ages of 26 and 43, (b) a 
group of 265 college students as a check of effect of age on 
item frequency, (c) a group of 20 psychiatric patients care- 
fully selected as probable psychasthenia cases. 

The criterion group included patients who had been in- 
tensively studied medically and psychiatrically and in whom 
the final diagnosis was psychasthenia in one or another form. 
Unfortunately, as mentioned above, it was necessary not only 
to use a rather small criterion group, but also to include in 
the group several persons who, as it subsequently developed, 
were probably not appropriate. For example, the two of this 
group who received the lowest final scores were young persons, 
one of 16 and the other of 17 years. One of these was not at 
all similar in item responses to the remainder of the criterion 
group. It is probable that this 16-year-old boy was wrongly 
diagnosed. These two young patients are the two cases testing 
lowest in the criterion group. (See Figure 1.) 

A preliminary step in deriving this scale was the tabulation 
of the item responses of the criterion group in contrast to the 
norm groups of men and women and the college students con- 
sidered as a mixed normal group selected for age and for 
scholastic aptitude. All items that showed a differentiation of 
two or more times the standard error of the difference between 
the criterion and all of the normal groups were chosen as a 
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preliminary scale for psychasthenia. All available normal and 
psychiatric cases were then scored on this preliminary scale. 
It was possible at this point to check whether or not the scale 
seemed to be working in the right direction and to determine 
its apparent variability. Since the scale as derived in this 
preliminary fashion appeared to be unusually homogeneous 
and since there were other potentially useful items that had 
been doubtful in the statistics of the comparison of criterion 
and normal groups, item correlations were used to test all 
preliminary scale items as well as certain of the doubtful 
items not included in the preliminary scale. 
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Fig. 1. Composite polygon of 690 normal cases constructed with the 
abscissa sealed ix. standard scores. Above the polygon are plotted the 20 


criterion cases and 50 cases that were noted as having some symptomatic 
evidence of psychasthenia, 


Tetrachoric correlations were obtained for every preliminary 
item and all the doubtful items against total scores on the 
preliminary scale for a sample of 100 normal persons and for 
a sample of 100 randomly selected psychiatric patients. These 
data combined with the original comparison data of criterion 
and normal cases permitted us to select a final scale of 48 items. 
The following list contains all of these final scale items each 
followed by a ‘‘T’’ or an ‘‘F’”’ to indicate the direction of 








I seldom worry about my health 
At times I have fits of laughing and crying 
that I cannot control 
I seem to be about as capable and smart as 
most others around me 
My memory seems to be all right 
I feel weak all over much of the time .................. 
I cannot understand what I read as well as I 
used to 
There seems to be a lump in my throat much 
of the time 
I wake up fresh and rested most mornings 
Most nights I go to sleep without thoughts 
or ideas bothering me 
I almost never dream 
I like to study and read about things that I 
am working at 
I do many things which I regret afterwards (I 
regret things more or more often than 























others seem to) 
In school I found it very hard to talk before 
the class 











I am easily embarrassed 
I am more sensitive than most other people 
I easily become impatient with people ................. 
Even when I am with people I feel lonely 
much of the time 
I wish I could be as happy as others seem 
to be 
| My daily life is full of things that keep me 
interested 
I have had periods of days, weeks, or months 
when I couldn’t take care of things because 
I couldn’t ‘‘ get going’’ 
I frequently find myself worrying about some- 
thing 
Most of the time I feel blue 
Much of the time I feel as if I have done 
something wrong or evil 
I feel anxiety about something or someone 
almost all the time 
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Once a week or oftener I become very ex- 
cited 72 
I have periods of such great restlessness that 
I cannot sit long in a chair 75 
Sometimes I become so excited that I find it 
hard to get to sleep 50 
I forget right away what people say to me ........ 74 
J I usually have to stop and think before I act 
even in trifling matters 
I have a habit of counting things that are not 
important such as bulbs on electric signs, 
and so forth 
Sometimes some unimportant thought will run 
through my mind and bother me for days 
Bad words, often terrible words, come into my 
mind and I cannot get rid of them 
Often I cross the street in order not to meet 
someone I see 
I have strange and peculiar thoughts 
I get anxious and upset when I have to make 
a short trip away from home 
Almost every day something happens to 
frighten me 
I have been afraid of things or people that I 
knew could not hurt me 
I have no dread of going into a room by 
myself where other people have already 
gathered and are talking 
I am afraid of losing my mind . Satriani 
My hardest battles are with myeslé .. Sccdiaiadbantes 
I have more trouble concentrating than others 
seem to have 
I have several times given up doing a thing 
because I thought too little of my ability 
I find it hard to keep my mind on a task or 
job 
I am inclined to take things hard 
Life is a strain for me much of the time 
I certainly feel useless at times 
I am certainly lacking in self confidence ............. 
Once in a while I think of things too bad to 
talk about 
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the scored answer. After each item is given the tetrachoric 
correlation of the item with total score on the preliminary 
scale. The first figure is the correlation from normal cases 
and the second that from phychiatric cases. It was assumed 
that items were valid if they correlated with either group. 
In some cases only one correlation is given since the cell fre- 
quencies for a response might be too low to obtain a valid 
indication of the other correlation. A few of the items with 
low correlations were retained because the item had appeared 
very strong in the criterion group. The items are merely 
counted +1 when answered in the indicated direction. 

The distributions of Figure 1 show graphically the scores 
of the normal, the criterion cases and fifty symptomatic cases 
on the final scale. The scores used in Figure 1 are standard 
score equivalents derived from the statistics for 293 normal 
males and 397 normal females between the ages of 16 and 45. 
Two standard score tables were used, one for the males and 
one for the females. This cancels the sex differences. (See 
Table 1.) The standard scores were fitted to a mean of 50 
and a standard deviation of 10. 


B. VALIDITY AND INCIDENTAL FINDINGS 


There was relatively little change in score with age. Table 1 
shows the raw score statistics for ten-year intervals from 16 
to 65. The college student group deviates markedly from the 
group of similar age chosen at random from the population. 


TABLE 1 
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There is some difference between the sexes as observed with 
other scales but without further study no special significance 
should be attached to this difference. 

It is, unfortunately, not possible to estimate the validity of 
the psychasthenia scale by testing it on a new group of psy- 
chiatric patients diagnosed psychasthenia only but not used 
in the derivation of the scale. A few new cases have been 
diagnosed by the clinic but another year will be required for 
the accumulation of a sufficiently large group to permit this 
type of statistical validation. Nevertheless additional indi- 
viduals so far obtained by clinical diagnosis have been deviates 
on the scale. 

The evidence of validity as given by the psychiatric cases 
with clinical symptoms of some degree of psychasthenia is rela- 
tively clear and positive. This has been shown by experience 
in the clinie but is more graphically shown in Figure 1. The 
distribution marked symptomatic cases represents 50 psy- 
chiatric cases very heterogeneous in diagnosis but with the one 
common characteristic that they were marked by the staff as 


having some symptomatic evidence of obsessions or compul- 
sions. Since none of these cases was finally diagnosed psychas- 
thenia and since the clinician frequently overemphasizes symp- 
toms as seen in a person otherwise abnormal, the cases should 


TABLE 2 
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the mean 
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Other psychiatric 
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not be expected to be uniformly high in the scale. Nevertheless 
the trend toward high scores for the group is clearly signifi- 
cant. Only ten per cent fall below the mean for the normal 
group. 

Table 2 lists the means and standard deviations of several 
groups in comparison to the normal group. In contrast to 
previously derived scales, the physically ill individuals from 
other portions of the hospital test very little above normals 
not in the hospital. Psychiatric cases without recorded evi- 
dence of psychasthenia test above the normal average but 
the staff frequently fails to record the presence of some 
psychasthenic traits even though they are observed since the 
trends are not disabling. 

Several measures of reliability are available. For a group 
of 47 normal cases retested at intervals of never less than 
three days and up to more than a year, the test-retest relia- 
bility coefficient is .74+ .15. Most of these cases were em- 
ployees and staff but none knew that the test was to be re- 
peated. The standard deviation of this group was 4.9 on the 
first test as compared to over 7 on general normals and un- 
doubtedly the coefficient obtained represents a low limit rather 
than a true test-retest correlation value. The split half coeffi- 
cient obtained from a group of 200 random normal cases is 
£84 + .07. When a similar sample of 100 psychiatric cases 
selected at random is used, the correlation is .89 + .10. When 
these two correlations are statistically corrected to a full 
length test, they are .91 + .07 and .94 + .10. 

The test intercorrelation with hypochondriasis as measured 
by H-—Cy is .06 + .10 as obtained from 100 normals. The 
intereorrelation with depression as measured by D (symbols 
refer to the two previously published scales) on the same 
group was .44+ .10. When 100 miscellaneous psychiatric 
cases are used, these two correlations are, with H — Cx, .28 + .10 
and with D, .69 + .10. The rise in the correlation with depres- 
sion with psychiatric cases is probably to be expected since the 
complaint factors involved in psychasthenia are dynamically 
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related to depression so that many persons tend to have the 
psychasthenic type of fears in greater degree as their morale 
becomes lower, and conversely to be reactively more depressed 
as they are troubled by psychasthenia. 


Cc. SUMMARY 


The psychiatric designation psychasthenia as used in the 
present study, refers to a group of individuals who are fre- 
quently troubled by compulsions, obsessions, and phobias and 
who are often disabled by vacillation, excessive worry and lack 
of confidence. Through the differential study of persons hav- 
ing psychiatric evidences of psychasthenia, a scale was derived 
which is internally homogeneous and which differentiates 
clinic patients from normals in a large percentage of cases. 
Further evidence of validity is given by the fact that on the 
average persons exhibiting psychasthenic symptoms to only a 
minor degree score significantly higher than normals. 
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A NOTE ON THE VALUE OF CUSTOMARY 
MEASURES OF ITEM VALIDITY 


R. M. W. TRAVERS 
Ohio State University 


THE PROBLEM 


T is a common practice in the construction of information 

tests to build them from a file of items in which there is 

listed against each item a measure of its validity. These 
validity coefficients are usually calculated on the basis of the 
correlation between passing and failing on the item and the 
total score on an information test in the same area. Such 
validity coefficients together with the difficulty coefficients also 
given are usually used as the basis for selecting items for tests. 
Many colleges and university departments that use’ informa- 
tion tests of the objective type build up each new examination 
in this way by drawing their items from a file of previously 
used items to which new items are added from time to time. 
Each time an item is used, it is customary to recalculate its 
validity coefficient and thus it happens that on most items in 
the general file there is information regarding the validity of 
the item on a number of occasions. The theory underlying 
this procedure is that when an item has consistently low valid- 
ity, then the item is either discarded from the file or changed 
in such a way as to improve it. The use of validity coefficients 
for this purpose can be justified only if two assumptions are 
sound. The first of these assumptions is that these validity 
coefficients remain fairly consistent when the items are used on 
different occasions. The second assumption is that the valid- 
ity coefficients caleulated in the way described are actual 
measures of the true validity of the item. In this paper it is 
proposed to examine some data to see how far the first of these 
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assumptions is capable of being defended. Then in the light 
provided by this evidence, it will be possible to discuss the 
second assumption. 

PREVIOUS STUDIES 


There is a very large literature dealing with the calculation 
and use of the customary measures of validity of test items. 
The usual procedure recommended is to calculate the validity 
coefficients from the tails of the total distribution of test scores. 
The advantage of this procedure is that it is short since it in- 
volves no more than the calculation of the percentage passing 
the item in a group scoring high on the total test and the per- 
centage passing the item amongst a low scoring group. The 
correlation between the item and the total test can then be read 
from a table. Kelley (3) suggests that the optimum size of 
these high and low groups is 27 per cent of the total but in 
spite of the sound theoretical basis for Kelley’s suggestion, the 
more common practice is to calculate item validity coefficients 
from the highest and lowest 25 per cents. Although the stud- 
ies dealing with such computational matters are numerous, 
there are few that are concerned with the matter of the velia- 
bility of such coefficients once they have been calculated. 

Carter (1) determined the reliability of the validity coeffi- 
cients of the items of an objective test in general psychology 
given at the end of a semester in his university. The validity 
coefficients for the items calculated from the errors made by 
the top 25 per cent and the lowest 25 per cent of a group of 
100 were found to correlate 0.45 with the validity coefficients 
calculated in a similar way from another group of 100 students 
who had had a similar background. The difficulty coefficients 
of these same items, namely the percentage failing on each 
item, when calculated from a group of 50 students, correlated 
with those calculated from other groups of 50 students to the 
extent of 0.96 and 0.97. 

Gibbons (2) obtained similar results to those of Carter in a 
study of the data provided by the Ohio district-state test in 
algebra for 1937 and 1938. The test consisted of 45 selected 
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multiple-choice items designed to cover a first year of algebra 
instruction. This test was given to over 400 students in each 
of the years. The difficulty coefficients based on the 1937 
scores correlated 0.97 with the difficulty coefficients based on 
the 1938 scores. The correlation between the validity coeffi- 
cients found in 1937 and those in 1938 was found to be 0.58. 
It is possible that in the case of Gibbons’ data the reliability 
of the validity coefficients was low because the range of these 
coefficients was probably limited. 

For reasons to be discussed later, it seems probable that the 
conditions in both the Carter and the Gibbons study were such 
that the item validity coefficients are less likely to vary than 
they would in other circumstances. All Carter’s students 
were taught by the same instructor, and all Gibbons students 
had had a practically identical curriculum. Both of these con- 
ditions are likely to make the validity coefficients of the test 
items unusually stable from one occasion to the next. 

However, it is important to note that one might expect such 


item validity coefficients to be less reliable than the correspond- 
ing difficulty coefficients since the former are functions of two 
difficulty coefficients calculated from a limited sample of the 
population tested. 


SOME ADDITIONAL DATA 


The present study of the value of the customary measures 
of item validity is based upon data provided by the examina- 
tions in the course in general psychology offered to all students 
in Ohio State University. In order to interpret this data, it 
is important to understand the way in which this course is 
organized. 

The course is taken by large numbers of students each quar- 
ter, and the total group taking the course is rarely less than 
600. The work of teaching such a large group is divided 
among several instructors so that an individual instructor 
rarely has to teach a group greater than thirty in number. 
Some instructors take more than one class but there are usually 
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as many as a dozen instructors teaching this course at any time. 
Although teaching practices vary from group to group, all 
groups follow a similar curriculum during any particular 
quarter and all groups use the same textbooks. It is impor- 
tant to note, however, that the instructors are allowed consid- 
erable freedom in the planning of their work and that conse- 
quently different instructors give different emphasis to the 
various points that the course covers. The groups also vary 
considerably from each other in ability, interest, and back- 
ground. All groups take the same midterm and final exam- 
inations and, since the groups differ, some groups achieve 
considerably higher average scores than do others on these 
examinations. The differences between groups usually remain 
consistent from the midterm examinations to the final examina- 
tion. The fact that these differences between groups on the 
examinations are not mainly due to teaching differences is 
indicated by the fact that on more than one occasion an in- 
structor has had both the group with the highest average score 
and the group with the lowest average score on the examination. 

The examinations are of the objective type and usually con- 
sist of about 100 items on the midterms and twice that number 
in the finals. About half the items in these examinations are 
of the true-false type and the remainder are of the multiple 
choice type with four or more choices. A few items have been 
used with three alternatives but these have not been included 
in the data for the present study. 

Until the Spring of 1941, the practice was to make up each 
new examination by drawing items from old examinations and 
adding to these old items whatever new items the instructors 
may have made up. However, since that date, a file of items 
has been compiled for all previous instances in which these 
items had been used in an examination. The item validities 
were calculated from the percentage passing the item in the 
25 per cent that scored highest and the’percentage passing the 
item in the 25 per cent scoring lowest on the total test score. 
All items were included in the file except in those cases where 
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the sum of the percentage passing in the high and low groups 
was less than 15. Many of the items in the file had been used 
on numerous occasions prior to the calculation of the item 
validity coefficients and from this data it is possible to find 
out the value of an item validity coefficient calculated on one 
oceasion in determining what that coefficient will be when it is 
calculated on another occasion. 

Of all the multiple choice items in the file 50 had been used 
five or more times prior to the introduction of the systematic 
filing system. The total distribution of these coefficients had 
a mean at 0.53 and a standard deviation of 0.24, which indi- 
cates that their range is very appreciable. Only five of these 
coefficients were negative. The intraclass correlation between 
these coefficients was found to be 0.21. This suggests that such 
coefficients are practically valueless for the purpose of choosing 
between these items in spite of the wide range of positive values 
over which these coefficients scatter. It should be noted that 
the small size of this correlation is not due to limited variation 
in the validity coefficients. 

It should be noted that this group of items represents a 
group that had been selected on numerous occasions as being 
good items. This correlation indicates then that the degree to 
which a test could be improved in internal consistency as a 
result of the use of item validity coefficients once the items 
have been selected by a group of competent judges is extremely 
small. It raises the question whether the labor of calculating 
such item validity coefficients is worth while. The cause of 
the high variability and low unreliability of item validity coef- 
ficients in the present instance will be discussed in the next 
section. 

Similar results were found on an examination of the 78 true- 
false items which had been used five times or more. The intra- 
class correlation between the five columns of validity coeffi- 
cients derived from these items was found to be 0.27, which 
again suggests that the value of such coefticients is limited in 
the kind of situation that has been described. Here again the 
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low correlation is not a result of the lack of variability of the 
coefficients, for the mean of these coefficients was 0.55 and the 
standard deviation 0.21. 

DISCUSSION 


The evidence provided in the previous section indicates that 
in the kind of educational framework described the customary 
measures of the validity of test items are practically valueless 
for the purpose of test construction. When information test 
items are used on several occasions, the validity coefficients of 
these items will vary from one occasion to the next and the 
amount of variation that these coefficients will show will de- 
pend very largely on the nature of the educational program 
and its administration in the groups on which these coefficients 
are determined. It is probable that in the situation that has 
just been examined, which is a fairly common one, that item 
validity coefficients are likely to vary very considerably since 
there are numerous factors in the situation that are liable to 
be sources of variation. These sources of variation are fairly 
obvious but since they are usually omitted from a discussion 
of the problem, it seems worth mentioning them here. 

Firstly, it has been mentioned that in the case of the present 
data the various groups on whom the item validities were 
determined varied in intelligence and that all instructors did 
not give equal emphasis to all points. Consequently, there is 
an interaction between the intelligence of the group and the 
instruction it receives which may result in great variations in 
the validity coefficients of the examination items. For exam- 
ple, suppose it happens that an instructor with a relatively dull 
group of students emphasizes a point which the other instruc- 
tors hardly mention at all, the consequence is that if the 
examination includes a test item which refers to that particu- 
lar point, then that item will have a validity coefficient that is 
either negative or very small if it is positive. If it had hap- 
pened that the same instructor had had a relatively bright class 
then the item validity would probably have been large and 
positive. ‘his source of variation of the customary measures 
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of item validity will become less as the number of groups and 
the number of instructors is increased. However, in the kind 
of educational program as that described the number of classes 
and the number of instructors must necessarily remain limited 
and consequently restrictions are placed on the value and re- 
liability of the usual measures of item validity. 

Secondly, in the elementary course in psychology at Ohio 
State University, the curriculum is being constantly modified 
and varies quite markedly from quarter to quarter and the 
instructors are encouraged to try out new approaches and new 
materials. Wherever such a situation occurs, one may expect 
a considerable variation in the validity coefficients of test items 
when they are determined on different occasions. When the 
curriculum is constantly under revision, facts and informa- 
tions that are covered by most students and instructors during 
one year may be covered by only a few during subsequent 
years. Reliable item validity coefficients indicate that the cur- 
riculum is standardized, inflexible and undesirable according 
to modern criteria of good teaching. It is probable that item 
validity coefficients might be of greatest value in building tests 
of the traditional subject matter areas where a rigid curricu- 
lum is still prescribed. 

Thirdly, when the validity coefficients of the items are cal- 
culated from the upper and lower 25 or 27 per cent in order 
to reduce the amount of labor to reasonable limits, the result- 
ing coefficients have relatively high errors of estimate unless 
the population from which they are derived include a very 
large number of cases. Fewer cases are required to achieve 
the same accuracy of estimation if ordinary biserial correla- 
tions are calculated instead, but the labor required to follow 
this procedure is usually much greater than that available. 

Finally, it should be noted that the general evidence from 
other researches as well as that given here indicates that the 
usual measures of the validity of information test items have 
a varying reliability and usefulness according to the situation 
in which they are derived and that they are probably less valu- 
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able in many test construction situations than they are usually 
considered to be. In the situation described here, which is a 
common one, it is questionable whether enough value can be 
derived from such coefficients to justify the labor and expense 
involved in their calculation, since items selected by judges 
will produce a test with almost the same degree of internal 
consistency. 
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MANIPULATIVE PERFORMANCE OF YOUNG 
ADULT APPLICANTS AT A PUBLIC 
EMPLOYMENT OFFICE—PART [ 


LORENE TEEGARDEN 
Department of Psychological Services, Public Schools, Cincinnati, Ohio 


PROBLEM confronting the Cincinnati Employment 
Xiew was the rating of applicants for certain types of 
non-skilled jobs including factory assembling, machine 
operating, table work of all kinds, packing, wrapping, waiters’ 
and maids’ work, and kitchen and pantry helpers. Such jobs 
involve no special skills for which a long period of training 
is required, but do involve-hand operations or manipulations 
in which rapidity of movement, dexterity, coordination, atten- 
tion to details, and various degrees of adaptability are neces- 
sary. 

To aid in the rating several performance tests were selected 
and used. This report presents results of the (1) Kent- 
Shakow Industrial Form Board, (2) Minnesota Spatial Rela- 
tions Test, (3) Minnesota Rate of Manipulation Test, and (4) 
Cincinnati Plier Dexterity Test, which is an adaptation of 
the O’Connor Tweezer Dexterity Test. 


POPULATION TESTED 


Test records which form the basis of this report are those 
of white applicants between the ages of 16 and 25 years, who 
registered at the Employment Center during a period of two 
years. Almost all were unemployed at the time of registering. 
In addition to the typically unemployed population, the rec- 
ords include those of some college students who registered for 
work during the summer months, and of some members of the 

1 Part II will follow in the December issue of this Journal. 
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Employment Center staff, who had different amounts of col- 
lege training and work experience. 

Manipulative tests were given, in so far as possible, to all 
applicants for jobs which would involve manipulative work, 
excluding those men who had actual experience in any of the 
skilled trades. These were given, instead, verbal trade tests 
covering the techniques of their trades. Many of these men, 
however, were above 25 years of age, and even if tested, would 
not have fallen within the limits of this report. Office and 
sales workers who took the tests were limited to those who 
were willing to work in office, store, factory, restaurant, or 
wherever they could get a job. Those who would consider 
only office or sales work were not included. Laborers tested 
were limited to those who were considered by the interviewers 
to be possible candidates for factory work. 

The norms developed at the Cincinnati Employment Center 
were based upon the performance of the population described, 
with a limitation of age to the decade between the 16th and 
25th birthdays. The purpose was to select workers for 
manipulative jobs. Though older people apply for such jobs, 
they usually compete with young adults, and many employers 
prefer the younger workers. For this reason it seemed logical 
to establish norms for the decade 16 to 25 years. Such a 
limitation had two additional advantages: (1) it reduced to 
a minimum the influence of age upon performance, and (2) 
the applicants within this age decade were probably more 
nearly representative of the entire working population than 
were those of any other available age group. This group in- 
cluded young women who after marriage might cease to work 
outside the home, and young people who would later specialize 
in skilled, commercial, or professional occupations, as well as 
many who applied for manipulative jobs but who were quali- 
fied for little else but common labor. 


TESTS USED 
Kent-Shakow Industrial Form Board. 
This is a form board developed by Grace H. Kent and David 
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Shakow from the Worcester Form Boards, and first reported 
in the Personnel Journal in 1928 (4). 

The equipment consists of a board 11 by 23 inches, in which 
are five holes of approximately the same size but different 
shapes. Eight sets of blocks for these holes are presented as 
a series of eight tasks, graded in difficulty from a simple intro- 
ductory practice task in which there are two blocks for each 
hole, to a difficult and complicated task in which five blocks 
fit together to fill each hole. Each task is designated by the 
number of blocks to the hole, together with the character of 
the cut, whether straight or diagonal, as 2-diagonal (2D) or 
4-straight (48). The second 4-diagonal pattern is distin- 
guished from the first one by its lettering (4D and 4DD). 
Each task presents one problem of the series, and in each task 
the blocks for each of the five holes are cut in the same way, 
so that the problem is presented five times in the five holes. 
This minimizes the possibility of chance solutions. In each 
task blocks for all five openings are presented together in a 
random arrangement, and though they may be correctly 
placed by chance in one or two of the holes, this very rarely 
happens for all five openings in any task. Inevitably the sub- 
ject is delayed in the solution of at least two or three of the 
openings by the fact that it is necessary to turn, shift, rear- 
range, and manipulate the blocks in order to fit all into their 
proper places. The test is self-corrective in that it is not pos- 
sible to complete the task until all blocks are placed correctly. 

Observation of the performance of many adults on the Kent- 
Shakow form board led to a division of the tasks into a speed 
group or series consisting of the simple problems (2D, 3S and 
4S) and an adaptation series consisting of the complicated 
problems (83D, 4D, 4DD, and 5D). This division was based 
upon the observation that for many adults the time required 
to complete the tasks of the speed series depends largely upon 
rapidity of movement rather than upon the ease of solution of 
the problems. It is true that to a small percentage of appli- 
cants even the simplest tasks appeared to be a difficult learning 
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problem. But for the large majority of adults the problems 
did not become difficult until 3D or 4D were presented. For 
the more capable subjects, even 3D seemed to be a speed test 
rather than a problem of. adaptation, since they could place 
the blocks almost as rapidly as they could pick them up. But 
since to most adults 3D presented a problem which made them 
pause to consider how it should be solved, it was considered 
to belong to the group of complicated, or adaptation problems. 

In solving problems of the complicated group rapidity of 
movement seemed to be a minor factor, since it was frequently 
observed that a slowly moving person who made a low rating 
on the simple problems made an increasingly high rating on 
the problems of the adaptation or complicated group, owing 
to the readiness with which he could react to the complications 
which those problems presented. 

All ratings are percentiles. Separate norms were developed 
for men and women. The time required for completion of 
each task was given its appropriate percentile rating. The 
rating for simple problems is the mean of ratings made on 
the three tasks of that series; and the ratings on all tasks 
taken in the complicated series are averaged to get the rating 
for that series. All except a few of the poorest applicants 
took all the simple problems and at least one (3D) of the 
complicated series. The more difficult of the complicated 
problems were given only to those who completed the simpler 
tasks within prescribed time limits. The Kent-Shakow test 
requires 20-40 minutes to give, with an average of about 30 
minutes. With two boards and one set of blocks two people 
may be tested at the same time. 


Minnesota Spatial Relations Test. 


This test was developed by the Employment Stabilization 
Research Institute of the University of Minnesota, from an 
earlier form board devised by H. C. Link (3). It consists of 
a series of four boards, each with 58 holes of different shapes 
into which blocks are to be fitted, one to each hole. For the 
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first two boards, A and B, the same blocks are used, but with a 
different arrangement and spatial orientation. The last two 
boards, C and D, use a second set of blocks, smaller and some- 
what more difficult to place than those of A and B. 

The percentile rating for the Spatial Relations Test is based 
upon the total time required to complete B, C, and D. Board 
A is used as an introduction. The test measures speed and 
accuracy in reacting to observed details of spatial relations. 
It gives a measure of ability to work with a variety of details, 
in a situation involving more of routine operations than of 
problem solving. It differs from the Kent-Shakow test in that 
the latter presents a graded series of problems covering a wide 
range of difficulty; and in that the Spatial Relations test 
depends largely upon discrimination of spatial details, 
whereas the Kent-Shakow, in its upper levels, depends upon 
resourcefulness and adaptation of procedure as essential fac- 
tors, in addition to perception of form and position. The 
Spatial Relations test requires 12 to 20 minutes. With one 
set two people may be tested in 35 minutes or four in about 
one hour. 


Minnesota Placing and Turning. 


This name was adopted at the Center to indicate the two 
parts of Minnesota Rate of Manipulation Test, which was 
developed by W. A. Ziegler (2). 

The board used at the Center contained 60 round holes 
arranged in four straight rows, into which fitted 60 identical 
blocks with 3/16 inch clearance. This differed somewhat from 
the standard board, in which the blocks have but 1/16 inch 
clearance. 

In these tests every effort was made to eliminate the learn- 
ing factor. The procedure was carefully taught, and the 
board was filled once for practice. For all except a few 
applicants of very poor ability the practice performance 
sufficed for mastery of the procedure, though a few were 
found who could not adapt readily to a new procedure even 
as simple as those used in the Placing and Turning tests. 
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The first task, Placing, measures rapidity of movement of 
the hand and arm when one hand is used alone in a simple 
repetitive process. The blocks are placed, one at a time, by 
a procedure which requires that each successive movement 
shall be of a different length from that which precedes or fol- 
lows it. The rating is based upon the total.time required for 
trials 2, 3, 4 and 5, the first trial having been used for practice. 

The second task, Turning, measures finger manipulation 
and control in a procedure requiring two-hand coordination. 
Each block is picked up from the hole with one hand, turned 
over, and replaced with the other hand. The subject moves 
along alternate rows in opposite directions, changing hands 
with each change of direction so that on one row he picks up 
with the left hand and places with the right, and in the fol- 
lowing row the right hand picks up and the left places the 
blocks. As in Placing, the percentile rating is based on trials 
2,3,4and5. People sometimes make widely different ratings 
on Placing and Turning. 

Ten or 15 minutes are required for the two tests. They 


may be given to one applicant while another is working at 
another table on Spatial Relations or on the latter problems 
of the Kent-Shakow test. 


Cincinnati Plier Dexterity Test. 


This is a modification of the Tweezer Dexterity Test devel- 
oped by Johnson O’Connor (5). The equipment includes a 
small square board covered with a brass plate and containing 
ten rows of ten holes each; small brass pins which fit into the 
holes with .01 inch clearance; a tray to hold the pins; and 
pliers? which were identical with those used by some of the 
industries of Cincinnati for assembly work. Pliers seemed 
more suitable than tweezers for testing for industrial jobs in 
Cincinnati. 

The Plier Dexterity Test gives a measure of exact neuro- 
muscular control essential in jobs which involve the exact plac- 


2 The plier used was #1033-6, made by the Crescent Tool Company of 
Jamestown, N. Y. 
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ing of small ‘objects. The applicant was carefully instructed 
and was shown how to hold the pliers and to place the pins. 
Time was kept separately for each half of the board (called 
Pliers 1 and Pliers 2). Some applicants increased or de- 
creased their ratings on Pliers 2, while some made similar 
ratings on both halves. In its requirement of exact movement, 
the Plier Dexterity test measures an ability not measured by 
other tests in the battery. 

Eight to twelve minutes are required for the test. It may 
be given while another applicant is working at another table 
on the Spatial Relations or on the latter part of the Kent- 
Shakow test. 

USE AND INTERPRETATION OF TESTS 


Kent-Shakow Industrial Form Board as a Measuring Instru- 
ment. 

The Kent-Shakow test is valuable to introduce the battery, 
if it is to be given to all applicants. It is important to begin 
with a test which challenges the interest and calls forth the 
best effort. This is especially true of aptitude tests as con- 
trasted with tests for specific skills, such as typing or blue- 
print reading. Applicants usually try their best in a skiil 
test, but they may be a bit scornful of fitting blocks together 
or doing any of the sort of tasks which may be included in an 
aptitude test. 

Though the first tasks of the Kent-Shakow test were easy, 
the applicant quickly realized that they were becoming more 
difficult, and soon he responded to the problems with care and 
attention. He then was willing to give his best effort to other 
tests, even though they might appear simple and easy. If 
the simpler tests were presented before the Kent-Shakow, a 
scornful or flippant attitude might prevent the applicant from 
doing his best on them, and as a result the validity of that part 
of his record was decreased. A competent examiner recog- 
nizes such an attitude, and should never fail to mark such a 
record as of questionable validity. 

After the introductory task (2S) the others are given in the 
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order : 2D, 38, 3D, 4S, 4D, 4DD,5D. When 3D is reached, if 
not before, the applicant realizes that he is encountering diffi- 
culties. If 3D is very difficult, as it is for those of low average 
or poor ability, he is encouraged by being told that it is harder 
and is expected to require longer for completion than the earlier 
tasks. The next task, 4S, is easier, and restores the appli- 
eant’s confidence. He can then stop with a feeling of success; 
or if he is to continue with 4D he faces it with renewed confi- 
dence, but with the expectation that it may be difficult. If 
he has not exceeded 210 seconds on 3D he proceeds to 4D. 
If he exceeds 10 minutes (600 seconds) on 4D he stops with 
that task, unless his performance has been such that he seems 
to have grasped the problem suddenly and has solved the last 
two or three holes quickly after long and ineffective efforts. 
In such a case 4DD is given in order to determine whether 
he is able to profit by experience. An applicant who after 
ten minutes is still baffled by 4D and who continues to fumble 
without showing insight into the problem, should complete 4D 
and, with the praise and satisfaction which that brings, should 
go no further. 

Task 4DD presents a problem similar to that of 4D. The 
writer values it for its indication of the applicant’s ability to 
learn spontaneously from his own experience without the aid 
of instruction or suggestion from a supervisor. Most subjects 
gain time on 4DD as comp’ ‘ed with 4D. If they find it as 
baffling as 4D they fail to gain, or they gain so little that the 
percentile rating drops below that made on 4D. 

If 4DD is completed in five minutes or less, it is followed by 
the concluding task, 5D. This is sufficiently difficult to chal- 
lenge engineering students or any superior adult; yet it may 
be successfully completed by mechanics, dressmakers, or by 
anyone who sees readily how the angles and curves of different 
shapes may be fitted into one another. 

The record kept for the Kent-Shakow test included a profile 
graph showing the percentile rating for each task, and the 
mean ratings for simple problems and for complicated prob- 
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lems. This is the quantitative record. A place is provided 
also for recording qualitative aspects of the performance, and 
space for additional comments and interpretation. The profile 
shows at a glance the level and direction of the performance, 
whether rising or falling with problems of greater complica- 
tion, or maintaining a level throughout. The full value of 
the Kent-Shakow test cannot be understood or utilized to the 
best advantage unless the record includes both qualitative and 
quantitative data. 

Norms for the Kent-Shakow test which were developed at 
the Cincinnati Employment Center are given in Tables 1 and 2. 
They are based upon the performance of 530 young adult white 
males between the ages of 16 and 25 years, and of a similar 
group of 350 females. All took the three simple tasks (2D, 
38, 48) and the first of the complicated tasks (3D). For 
almost 30 per cent of the men and 40 per cent of the women 
it was impractical to go further, because of the time factor and 
because of the ill effect on the applicant of problems too diffi- 
cult for him. An additional 10 per cent of each sex dropped 
out after 4D, and another 15 or 20 per cent after 4DD. 

The difference between the sexes in their performance on 
the simple tasks is insignificant. For 3D, however, the differ- 
ence between the means is 3.5 times its standard deviation, in- 
dicating a significant difference. A comparison of the upper 
portions of the distributions for the more complicated tasks 
suggests that for those also there is a significant difference 
between the sexes. The sex difference which appears in these 
complicated problems of the Kent-Shakow test is the only 
significant sex difference found in the entire battery of tests, 
with one exception. In the first half of the Plier Dexterity 
test a significant sex difference was found, which disappeared 
in the last half of the test. This was believed to be due to the 
relative unfamiliarity of women with the handling of a plier, 
and their need for a practice period before attaining their best 
speed. It was believed also that the sex difference found in 
the more complicated tasks of the Kent-Shakow test was the 
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TABLE 1 


Kent-SHakow Form BoarpD 
Percentile Distribution of Time Required for Completion of Each Task 
White Males. Ages 16-25 





Task 2D 38 48 3D 4D 4DD 5D 





No. cases 530 530 530 377 332 





Percentile Time in seconds 





100 16 25 35 45 85 
95 19 38 58 71 173 
90 23 43 65 81 202 
85 25 47 68 89 231 
80 26 50 71 95 267 
75 27 53 75 107 304 
70 29 56 78 116 334 
65 30 58 82 122 365 
60 32 61 86 131 399 
55 33 64 90 139 437 
50 34 67 94 146 478 
45 36 70 98 159 558 
40 37 74 104 170 646 
35 39 78 110 188 782 
30 41 83 117 215 1201 
25 43 90 122 246 
20 45 96 128 280 
15 50 106 139 322 
10 55 118 152 388 

70 150 189 485 
300 900 700 1000 





38.5 79.6 105.9 197.7 
21.03 55.54 52.7i 139.00 
4.5 13.5 15.0 84.0 

0.48 1.69 1.98 6.80 





Note.—For the last three tasks (4D, 4DD, 5D) only the upper portion 
of the distribution shown for the first four tasks was tested. What the 
records would have been for the untested portion of the total distribution 
is unknown. It is assumed that those who fell below certain points on 
3D and 48 would have been in the lower part of the range on the more 
difficult tasks also. Means and medians were not computed for those who 
did take the more difficult tasks, because no two such groups could be com- 
pared with each other, nor could they be compared with any other group 
studied. 
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TABLE 2 


Kent-SHAKOW ForM Boarp 
Percentile Distribution of Time Required for Completion of Each Task 
White Females. Ages 16-25 

















Task 2D 38 48 3D 4D 4DD 5D 
No. cases 350 350 350 350 215 176 131 
Percentile Time in seconds 

100 15 25 45 45 85 110 175 

95 20 42 60 78 176 154 293 
90 25 47 67 93 211 178 344 
85 26 50 71 103 258 198 419 
80 28 52 75 113 304 217 496 
75 29 55 78 124 335 239 589 
70 30 58 82 134 370 267 725 
65 31 61 86 144 414 302 1037 
60 32 64 90 156 468 354 
55 34 67 94 171 537 425 
50 35 71 98 188 618 725 
45 37 74 102 207 721 
40 39 78 107 227 1068 
35 41 82 112 242 
30 43 86 117 258 
25 45 92 123 287 
20 50 100 130 326 
15 56 107 140 371 
10 64 115 152 425 
05 94 144 189 575 
00 250 750 900 1200 

Mean ............. 41.6 82.7 112.7 236.5 

OR ee 24.20 64.58 84.00 177.00 

Skewness ...... 9.5 10.0 10.5 71.0 

CA Des al 1.08 1.89 2.35 9.20 





See Table 1, Note. 


result of a difference in the experience of boys and girls with 
problems or projects of constructing, assembling, and putting 
things together. 
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The curves for all the Kent-Shakow tasks were skewed 
toward the slow end of the distribution. For the simple tasks 
and for task 3D the measure of skewness* is from 5 to 12 times 
its standard deviation, which indicates that the distributions 
are significantly skewed. The more complicated tasks would 
show still greater skewness. Enough is known of these dis- 
tributions to indicate that for 4D the tail would include 35 


Fr. 


200 
180 
160 
140 530 meles 
120 
100 
80 
60 
40 














225-249 
250 


Fig. 1. Distribution of performance time on task 2D of Kent-Shakow 
test. 


to 40 per cent of the total ; for 4DD, 40 to 50 per cent; and for 
task 5D, the tail of the curve would include 50 to 65 per cent 
of the entire distribution. Thus it is apparent that this 
graded series of tasks differentiates through a wide range the 
ability or complex of abilities which the test measures. The 
simple tasks are valuable at the lower ranges, while the more 
difficult tasks differentiate abilities in the upper ranges which 
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are beyond the reach of a large percentage of the total group. 

The value of the test as an aid in vocational guidance or in 
selecting workers will depend upon the extent to which the 
ability or complex of abilities which is measured by the test 
varies among different occupations. 

What are the abilities which influence performance on the 
Kent-Shakow test? 

A trained clinician, watching the performance of many 
subjects, observes several factors: some physical, some mental 


530 males 
350 females 











950- 999 
1000-1049 
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Time in seconds 


Fig. 2. Distribution of performance time on task 3D of Kent-Shakow 
test. 


or intellectual, and some depending upon personality or emo- 
tional make-up. Though instructions to the subject have 
been evolved through several years of use, to the end that 
every word and phrase should convey its meaning and call 
forth the desired response as effectively as possible, yet the 
test is not dependent upon language when used with those who 
are not themselves dependent upon communication by lan- 
guage. Presentation of the blocks with the aid of gestures 
and the examiner’s facial expression is sufficient to make deaf 
subjects understand what is desired; and as the tasks become 
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more complicated they seem to be self-explanatory without the 
aid of language. This is more true of the Kent-Shakow than 
of any other test in the battery.‘ 

The most important physical factor influencing the perform- 
ance rating is rapidity of movement, which seems to affect 
appreciably the performance on simple tasks. This factor 
loses its influence increasingly in the group of complicated 
problems. Ability to use only one hand, as in cases of ampu- 
tation or crippling, seems negligible as a factor. Many people 
with two normal hands worked chiefly with one hand alone. 
Vision as a physical factor seems negligible. The blocks are 
large and the forms distinct, so that only in the most extreme 
eases would poor vision be a handicap in the performance. 

Several non-physical factors are important. The first of 
these is the ability to react to the shape of the block and the 
hole in relation to each other. The examiner is amazed at 
times to see people attempt to fit a triangular block into a 
triangular hole, fitting angles to sides instead of matching 
sides to sides and angles to angles, or to fit the curved side of 
a block to the straight side of a hole, or vice versa. Some 
subjects place the block in the hole and examine it later if it 
does not fit, or lay it down again without examining it at all, 
and try another in the same way. Perhaps the applicant holds 
a block over a hole slightly out of the correct position, and 
abandons the attempt without perceiving that the block would 
fit if rotated slightly in one direction or the other. Or he 
may hold the correct block in reverse position over a hole, and 
fails to see that it would fit if reversed. On the other hand, 
some see quickly how a block would appear if turned over, 
or that two shapes if placed together will exactly fit a given 
hole. The ability to react readily to details of size, form, and 
position is an important factor in performance time. 

A second non-physical factor is the ability tc vary methods 
and try new ones to attain a solution. Lack of this ability 


4 Full directions for the test as used by the writer may be obtained by 
addressing the author through the JOURNAL or APPLED PSYCHOLOGY. 
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produces perseverative errors. Some subjects attempt to 
place a block into a hole repeatedly in exactly the same way 
with exactly the same failure, until the attention shifts to 
another hole, where perhaps the same method is repeated, and 
again with failure. Others vary their attack quickly, trying 
one method after another with eventual success. In attacking 
another hole they may begin in the same way and go through 
a succession of methods to the correct one; or they may jump 
from the early attempts to the final and successful method ; 
or, in exceptional cases, they may use only the successful 
method after discovering it. The types of reactions just de- 
scribed seem to be associated with different degrees of the 
ability to vary or modify methods in solving new problems, 
that is, in solving the type of problems presented in the Kent- 
Shakow test. This is learning or adaptability, especially the 
ability to learn spontaneously and without direction, super- 
vision, or instruction. 

A third non-physical factor, as seen in clinical observation, 
is more difficult to describe. It involves steadiness, poise, con- 
centration, and emotional reaction to success or frustration. 
Some subjects retain their poise under difficulties and concen- 
trate the attention more intensely. Others lose their poise 
and show irritation, with a resultant shifting of the attention 
and loss of concentration. The attention seems to shift in part 
from the problem to the ego. Pride and confidence are dis- 
turbed, and the subject gives the impression of one who is 
hampered by fear—perhaps fear that his prestige will suffer. 
Some become so disturbed that they are unwilling to go fur- 
ther. The effectiveness of their reactions to the immediate 
problem is impaired by such divided attention or by the con- 
flict of fear. Fortunately it is not difficult for the understand- 
ing examiner to stimulate continued effort to the point where 
confidence is regained, so that the subject is able to stop with 
a feeling of better success or to continue with better poise. 

All three of the non-physical factors described seem to the 
writer to be important in performance of the Kent-Shakow 
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test, and in the interpretation of that test performance in 
terms of the requirements of a job or vocation. People who 
possess poise or adaptability may show poor perception of 
form. Some who show good perception and adaptability may 
lack poise under difficulties. Others may exhibit little besides 
poise and placidity. And occasionally one is found who shows 
excellent form perception, quick adaptability, fine poise, and 
in addition is quick in his movements. Such a person may 
prove to have high school or college training, or he may have 
no more than eighth grade schooling. It may be a man ora 
woman, with or without vocational training or job experience. 

The factors described have been analyzed from clinical ob- 
servation. They have not been treated statistically by the 
method of factor analysis. The numerical rating of the test 
performance is the result of the interaction of all factors in- 
volved. Separate characteristics observed should be noted 
on the record, and interpreted by the examiner in relation to 
performance on other tests in the battery. The examiner’s 
notes on the test performance have been considered by those 
who work with the test results to be as valuable, or more so, 
than the numerical rating. 

Two ratings are secured from the Kent-Shakow test. The 
correlation between these ratings was computed for 600 males 
and 500 females, including ages from 16 to 60 (though with 
comparatively few over 45) and a small percentage of negro 
as well as white applicants. The results are presented in 
Table 3. Those who took the entire seven tasks, it will be re- 
membered (Group IV, Table 3), constitute only the upper 40 
per cent of the males and the upper 33 per cent of the females, 
while those who took four tasks or more (Group I) include 
the entire range of ability tested. The correlation between 
ratings made on simple and complicated problems is definitely 
positive, but is not extremely high, even when the entire range 
of ability is included, as in Class I of Table 3. 

Table 4 gives the mean correlation of each task with the 
other tasks in the same group, and the mean correlation of 
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TABLE 3 


Kent-SHAkOW Form BoarpD 
Correlation between Percentile Ratings on Simple 
and Complicated Problems 





Males Females 





Per Per 
cent eent 
of No. ‘of 
total total 





I. Completed 4 tasks or more 100 .72 500 100 ~ = .7i1 
Il. Completed 5 tasks or more .... 407 67 58 280 58 «44 
IIT. Completed 6 tasks or more 349 58 54 231 46 Al 
IV. Completed 7 tasks 40 49 159 33 —s 36 





each task with all the tasks of the other group for each class 
of the people tested. Thus for Class I, i.e., for the entire 
distribution including all people tested (all took at least four 
tasks of the series), task 3D has a mean correlation of .57 (for 
males) or .55 (for females) with tasks of the simple group. 


Since 3D was the only complicated task taken by all of this 
group, there is no other complicated task with which it can be 
correlated. For class II, those who completed at least five 
tasks, 4S correlates .49 with other tasks of the simple group 
(.31 for females), and .43 with the complicated tasks which 
were taken by all in this class (.20 for females), namely, 3D 
and 4D. 

Examination of Table 4 reveals several relationships which 
should be noted: 

1. For the entire range of people tested (Class I) the mean 
correlation of each task with other simple tasks is about the 
same as its correlation with task 3D of the complicated group. 

2. For classes II, III, and IV, simple problems correlate 
more closely among themselves than do complicated problems 
(Column 10). Apparently the complicated tasks measure dif- 
ferent aspects of an ability which is more complex than what 
the simple tasks measure. 
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TABLE 4 


Kent-Sxakow Form Boarp 
Mean Correlation of Each Task with Other Tasks in Same Group; Mean 
Correlation of Each Task with All Tasks of the Other Group; 
and Mean Intercorrelation of All Tasks within Each Group 





Complicated Mean 


inter- 
Tasks of problems corre- 


completed enews lation 


sep. 
2D 38 48 3D 4D 4DD 5D Ps. 


No. Simple problems 








Males 
I. 4 or more Simp. . 57 .61 
Comp. . 58 .57 
II. 5 or more Simp. . 42 49 
Comp. . 39 43 
. 6 or more Simp. . 38 46 
Comp. . 27 31 


. 7 tasks Simp. . 36 45 
Comp. . 25 .28 








Females 
. 4 or more Simp. 55 .53 .53 
Comp. .58 .53 .55 
. 5 or more Simp. .34 .27 .31 . p 
Comp. .28 .17 .20 . .28 
. 6 or more Simp. .33 .27 .26 . 15 
Comp. .25 .17 .17 . 21 
. 7 tasks 59 Simp. .38 .29 .26 . 11 
Comp. .24 .16 .16 . 17 








Column 1 2 _ ae TAS 7 10 





Table 4 is read as follows: For males of class I, those completing four 
tasks or more (which includes the entire group tested), task 2D has a 
mean correlation of .56 with other tasks of the simple group (38 and 48) 
and .56 with 3D, which is the only complicated task taken by the entire 
class I. For males of class IV, who completed all seven tasks, 2D has a 
mean correlation of .40 with other tasks of the simple group, and .26 with 
the tasks of the complicated group. For this same class IV, task 3D has 
a mean correlation of .23 with all tasks of the simple group and .16 with 
the other tasks of the complicated group. The simple group includes 2D, 
3S, and 48S. The complicated group includes 3D, 4D, 4DD, and 5D. 
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38. Simple tasks correlate with other simple tasks more 
closely than with complicated tasks. 

4. For men, the complicated tasks also correlate more closely 
with simple tasks than with the other complicated tasks. For 
women there is a very slight tendency for complicated tasks 
to correlate more closely with complicated than with simple 
tasks. 

These facts suggest several possibilities. Whatever is mea- 
sured by the simple tasks may be so fundamental that it is 
the most essential element of the ability to solve complicated 
as well as simple problems. If this is true, tasks 4D, 4DD 
and 5D may be superfluous, adding little to the test results. 
Or ability to solve complicated problems may be so very com- 
plex that tasks which measure various aspects of that ability 
may of necessity be more unlike one another than they are 
unlike the simple tasks which measure the fundamental ability. 

Further study of the separate tasks of the Kent-Shakow 
test in their relation to an acceptable criterion of mechanical 
ability as manifested in performance on the jobs, is necessary 
in order to determine the relative validity of the separate 
tasks or groups of tasks. 

When the Kent-Shakow test is supplemented by other tests 
which involve simple operations and afford the opportunity 
to observe the applicant’s attitude toward easy tasks, the Kent- 
Shakow test may be shortened, for those who complete 2D in 
30 seconds or less, by passing immediately to 3D. This can 
be done in about 30 per cent of the cases. If, then, 3D is com- 
pleted in 100 seconds or less (approximately 15 or 20 per cent 
of the entire distribution), 4D may follow immediately after 
3D. If more than 100 seconds are required for 3D, then 4S 
should be given before continuing with 4D. If 3D has re- 
quired more than 210 seconds after the omission of 3S, it 
should be followed by 3S and then 4S as a concluding test. 
These omissions would make but small savings in time, but 
they would eliminate some of the very simple tasks from the 
testing of people in the upper ranges of the distribution. 
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Shortening of the test by this method, however, would be use- 
ful only if further study of the test shows that the more com- 
plicated tasks are more useful than the simple tasks for mea- 
suring ability to solve complicated mechanical problems. 

Records of cases in the correlation groups were examined in 
order to compare the mean rating on all tasks taken with the 
rating that would have been made if 3S or 458 or both had 
been omitted. Among those who completed 5D, approxi- 
mately five sixths of both men and women had completed 2D 
in 30 seconds or less, and might have been given the shorter 
form had it been in use at the time. These records were re- 
rated according to the rules for the short form, and the re- 
sultant mean rating was correlated with the mean rating made 
on the complete form. For the 192 men ineluded in the re- 
rating, the correlation between the two forms was .85, and 
for 136 women it was .78. A similar analysis was made for 
the groups completing 4DD or beyond, which of course in- 
cludes the groups completing 5D, already mentioned. For 
these the correlation between the short form and the complete 
form was .87 for the men and .83 for the women. 
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PROCEDURE FOR OBTAINING SIX PART 
SCORES FROM ANSWER SHEETS IN 
ONE RUN THROUGH THE IBM 
TEST SCORING MACHINE* 


GEORGE B. SIMON 
Air Corps Headquarters, Fort Worth, Texas 


INTRODUCTION 


ITH the development of objective or new type tests 
and their increased used in large-scale surveys, 
research projects, and school testing programs the 

need for rapid and accurate scoring of test papers became 
more apparent. As this problem was more and more satis- 
factorily solved, the use of reliable measures of mental traits 
and academic achievement on a large scale was encouraged 
and therefore further increased. As a result of the recog- 
nized need for the rapid and accurate scoring of tests the 
International Business Machines Corporation developed its 
test scoring machine. The machine is electrically controlled 
and employs a special answer sheet which must be marked 
with a pencil containing a special kind of lead that is electri- 
eally conductive. The use of this machine has enabled scoring 
to be done, for example, at the rate of 400 papers an hour 
for tests containing 150 questions with five choices each where 
the score is obtained from the number of right answers minus 
one fourth of the number of wrong answers. In the proce- 
dures described below as many as 1500 scores were obtained 
in one hour. Not only is the speed greatly increased over that 
for hand scoring but also the accuracy is greatly increased. 


1 The writer is indebted to Lt. Colonel John C. Flanagan and Captain 
Walter L. Deemer of the Army Air Forces for their helpful suggestions 
and criticisms in the preparation of this paper. 
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THE PROBLEM 


The International Business Machines test scoring: machine 
has three field switches and a master control switch which 
make it possible to get three different part scores from one 
answer sheet inserted once. By assigning the items on the 
answer sheet to the three fields (A, B, and C) and turning the 
master control switch successively from A to B to C a maxi- 
mum of three different part scores could be obtained. 

If it was desired to get six different part scores, this would 
be accomplished ordinarily by running all the papers through 
the machine twice. Three part scores would be obtained in 
the first run, using a key which controlled the items on the 
first three parts and thus yielded the first three scores. The 
key would then be replaced by a second key controlling the 
items on the last three parts. The answer sheets would then 
be run through the machine a second time to get the last three 
scores. 

The purpose of this paper is to indicate how it is possible 
to get six different part scores from one side of an answer sheet 
in a single run through the test scoring machine. The proce- 
dure described below applies to tests or parts that are scored 
for right responses only. It does not apply to scores which 
deduct a fraction of the wrong answers from the right answers 


(i.e., where the scoring formula is R ~~). 


Two different set-ups for getting six scores in one run are 
presented. Included in this report also are some preliminary 
data comparing this method of getting six scores in one run 
with the normal method requiring two runs. Finally, there 
is a suggestion for adjusting the test scoring machine to the 
method of getting six scores in one run to increase the speed 
and accuracy of the scoring. 


PROCEDURE 


If all the items are on one side of the answer sheet and if 
the scores are the number of right answers, six different part 
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scores may be obtained from one insertion of an answer sheet 
in the test scoring machine. Let us assume that we have a 
150-item answer sheet containing six different parts, Pts. I-VI. 
Let us represent the number of items in each part by N,, No, 
N;, ete. (Ni=No. of items in Part I, ete.) Each N must be 
a multiple of 15 because there are 10 sections of 15 items each 
and when a section is controlled one way all its 15 items are 
controlled in the same way. The sum of six N’s cannot ex- 
ceed 150. 

We may now distribute these parts among the three fields 

available, putting Parts I and IT on Field A? 
Parts ITI and IV on Field B 
Parts V and VI on Field C. 
Both a rights key and an item elimination key will be needed. 

1. The rights key should contain the rights for Parts I, ITI, 

and V only. 

2. The item elimination key should be punched as follows: 

a. Punch out both positions (R and W) in the field 
control for each field. (For example, if N,+N.= 
60, the first 60 items will be on Field A with both 
the R and W control positions punched out). 

b. All response positions except the rights for Parts 
II, IV, and VI will be punched out. 

Thus in Field A the only ‘‘wrong’’ answers will be those 
positions corresponding to the rights on Part II. In the table 
below the part score obtained by the different switch settings 
is indicated. 


Field Position Part score Position Part score 
A R I WwW II 
B R III Ww IV 
Cc R Vv Ww vI 


If desired, the total rights score may be read by throwing 
all fields on A and setting the A switch on R+W. In any 
field, we may obtain the sum of the two part scores by setting 
the switch on R+W and the difference between the two part 

* See below for alternative set-up. 
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scores by setting the switch on R-—W. By varying the parts 
put into a field, the sum and difference for any two parts may 
be obtained. 


In the set-up described above the machine operator must 
turn the switches in the following order to get the part scores 
in order from I to VI: 


(Set-up A) 
Part score 
1. Master Switch on A 
2. A Switch on R I 
3. A Switch on W 
4, Master Switch on B 
5. B Switch on R 
6. B Switch on W IV 
7. Master Switch on C 
8. C Switch on R 
9. C Switch on W 


Vv 
VI 


An alternative set-up is possible. This would mean revising 
the keys so that the part scores would be obtained in order by 


setting 
(Set-up B) 


Part score 

1. A Switch on R 

2. B Switch on R 

3. C Switch on R 

. Master Switch on A 

. Master Switch on B 

. Master Switch on C 

7. A Switch on W 

8. B Switch on W 

S. C Switch on W 
10. Master Switch on A IV 
11. Master Switch on B Vv 
12. Master Switch on C vi 


COMPARISON WITH USUAL PROCEDURE 
In order to get data for comparing the method of getting 


six scores in one run by set-up A with the usual method of 
getting only three scores at a time, records of the speed and 
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accuracy of scoring were kept for some 700 answer sheets. 
The procedure involved two steps, original scoring and check- 
ing. In the original scoring the scorer calls out a testing num- 
ber and reads the score from the dial, calling it out to the 
recorder. In the checking process the scorer calls out a testing 
number, reads the score from the dial, calls it out to the checker, 
and records it on the answer sheet. An analysis of the results 
was made for each step, for each method, and for each scorer. 
This analysis is presented in Table 1 below: 


TABLE 1 


Summary of Scoring Rates and Errors 








Original scoring Checking 
6 ata 
6 at a time ‘ time 3 ata 
(Method A) 3 at a time (Method time 


Scorer A) 





No. errorst Rate* No. errorst 


eS eat ee ee 
Rate® - No. scored No. scored 


Rate*  Rate* 





3 
140 150 150 105 80 


2 


_— 


150 1 135 115 105 


0 
0 
C 250 10 215 145 140 





* Rate is in terms of number of answer sheets completely (¢.¢., all six 
parts) scored per hour. 

t Number of errors is the number of scores (there were six scores per 
answer sheet) that were in error; the number scored is in terms of num- 
ber of answer sheets. Actually no paper was found to have more than 
one score in error. Of the total of 13 errors found only one was an error 
of two points. All others were one point errors. 


Scorer A is a slow but experienced scorer, scorer B is a 
more rapid but inexperienced scorer, and scorer C is a rapid 
and experienced scorer. None of these scorers had ever ob- 
tained six scores in one run before. The figures for speed 
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would seem to indicate that scorer A ought to score by the 
three-at-a-time method and check by the six-at-a-time method 
(Methed A). However, this scorer was slow to adjust to 
Method A. After more experience with this method, her 
speed would probably increase. Her speed for the three-at-a- 
time method is not likely to increase since it agrees with pre- 
vious records of her scoring rate. 

From the meager evidence in Table 1 it would appear that 
one six-at-a-time run is faster than two runs of three-at-a-time. 
Based on the figures for scorer C (these are believed to be the 
most reliable figures) the difference in favor of the six-at-a-time 
method is greater when the scores are merely called out than 
when the scorer also writes them down. This appears reason- 
able since scorer C was able to make fast time by using both 
hands on the switches while scoring, but had to keep one hand 
free for recording during the checking process. 

It should be noted that the time for the three-at-a-time 
method is actual running time between the insertion of the first 
answer sheet and the release of the last answer sheet for each 
set. It does not include the time necessary for changing keys, 
for checking the machine, or for the extra handling of answer 
sheets. 

If much seoring of six parts per answer sheet is to be done, 
it is probable that a single switch with six (or more) positions 
corresponding to R and W on Fields A, B, and C or two 
switches of three positions each 

1. R Switch for Fields A, B, and C 
2. W Switch for Fields A, B, and C 
could be set up to simplify the turning of switches. 

With respect to the accuracy of scoring the table seems to 
indicate that there is little or no difference between the two 
methods. The percentage of error was very small in both 
eases. With a one (or two) switch set-up for the six-at-a-time 
method the expectancy of error ought to be less than for the 
three-at-a-time method. 
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THE INTELLIGENCE OF JEWISH COLLEGE 
FRESHMEN AS RELATED TO 
PARENTAL OCCUPATION 


AUDREY M. SHUEY 
Washington Square College, New York University 


A. INTRODUCTION 


EVERAL investigators have found that white college 
S students, when grouped according to the occupations of 
their fathers, vary in average mental ability from the 
professions to unskilled labor. Sandiford (8) presents the 
combined findings for 5052 high school, normal school, and 
university students in British Columbia. He reported an 
average difference of 4.3 IQ points separating: the offspring 
of unskilled laborers from those of fathers in the professions. 
Between these occupational extremes were ranked the children 
of semi-skilled laborers, farmers, skilled laborers, and business 
and clerical workers respectively.2 ‘‘The New York and 
British Columbia results show that children of parents engaged 
in these various occupations also exhibit different levels of 
intelligence.—Intelligence sufficiently high to achieve success 
in a profession is handed down to children’’ (8, p. 119). 
Haught (4) converted into percentile rankings the Army 
Alpha and the American Council Psychological Examination 
scores of 3414 University of New Mexico freshmen. Haught 
found greater disparity between his extreme groups than did 
Sandiford, the mean score of the offspring of professional 
people being at the 60.21 percentile level, that of the children 
of unskilled laborers at the 34.72 percentile level. The critical 


1 The university group included an unspecified number of freshmen and 
graduate students. All 8’s were tested by the modified Alpha. 
2 No critical ratios reported. 
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ratios were significant in all instances except between the pro- 
fessional and the business and clerical groups. ‘‘Since the 
results for college students are found to be similar to those 
for grade school children, it can be concluded that the longer 
period of formal schooling of a somewhat standardized type 
does not equalize individuals of varying socio-economic back- 
grounds with respect to intelligence test performance’’ (4, 
p. 209). 

Bear (1) found the 94 freshmen at Centre College to rank 
on the Otis Self-Administering Test according to the following 
order of parental occupation; salesmen, professions, business, 
skilled artisans, and farmers. However, no significance can 
be attached to the shifting in position of the professional group 
from first to second place because there were only 10 and 11 
respectively in the first two groups and only 0.41 points sepa- 
rated their mean scores. 

As far as the writer knows there have been no studies re- 
ported on the intelligence of Jewish college students as related 
to parental occupation although there was one such investiga- 
tion made on Jewish school children in London. Hughes (5) 
did not find the usual decreases in ‘average scores as they 
ranged according to parental occupation from professional 
men and masters to the unskilled and laborers. In fact two 
groups of children scored higher than the offspring from the 
professional and master class, one of them being from the 
unskilled and laboring groups,’ including carmen, coalmen, 
and caretakers, and the other from the shopkeeper, dealer, and 
small master group. ‘‘This result, especially in Group 3,‘ 
seems to indicate that commerce attracts Jews of good intelli- 
gence, men of the calibre who, if they were non-Jews, would 
probably enter the professions’ (5, p. 93). The writer 
thought it would be interesting, in view of the above findings, 
to learn how Jewish college students compare with one another 
when arranged according to their fathers’ occvpations. 


8 Fewer than 20 8’s in this group (7). 
4 Shopkeepers, dealers, and small masters. 
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B. THE TESTS 


The American Council Psychological Examination was 
given to new students matriculating in Washington Square 
College by members of the Psychology Department from Sep- 
tember, 1935, to September, 1937, inclusive. The 1935 edition 
was given to the September, 1935, and the February, 1936, 
entrants, the 1936 edition to the September, 1936, and the 
February, 1937, entrants, and the 1937 edition to those tested 
in September, 1937. All tests were scored and tabulated by 
the Codperative Testing Bureau in New York City. 

A maximum number of 407 points was obtainable from each 
of the 1935, 1936, and 1937 forms of the examination. 


Cc. THE SUBJECTS 


The writer recorded separately on index cards the names of 
all students tested and their test scores. On the backs of these 
cards was subsequently included information secured from the 
Recorder’s Office concerning the following items: Permanent 
residence, Date of birth, Place of birth, Race, Religion, High 
school or college from which admitted, Name, Occupation and 
Birthplace of father, and Name and Birthplace of mother. 

Nine hundred twenty-one of the 3655 students tested were 
included in the present study, all of this number having satis- 
fied the requirements of being male, Jewish, native-born, fresh- 
men, graduates of public high schools of New York City or of 
its suburbs, not over 20 years when tested, of both native-born 
parents or of both foreign-born parents, as well as having 
given classifiable information regarding their fathers’ oceu- 
pation.® 

Of the 921 freshmen included in the study, 313 were tested 
by the 1935 edition of the A.C.P.E., 387 by the 1936 edition, 
and 221 by the 1937 edition. All individual scores on the 


5 Ninety-five S’s who otherwise satisfied the above criteria had to be 
eliminated because they failed to give the occupation of the father, re- 
ported that he was dead, retired, had no occupation, or gave a term that 
was too general or too ambiguous to be classified. 
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1936 and 1937 forms were transmuted into equivalent scores 
of the 1935 edition before they were combined with one an- 
other. The parents of 181 of the S’s were native-born, while 
those of 740 were foreign-born. Approximately 58 per cent 
of the foreign-born parents were born in Russia, 16 per cent 
in Austria, 14 per cent in Poland, 5 per cent in Rumania, 3 
per cent in Hungary, and the remaining 4 per cent in various 
other countries of Europe, the British Empire, and the Near 
East. 
D. THE RESULTS 


The occupations of the Jewish fathers were classified accord- 
ing to the listings in the Dictionary of Occupational Titles (3, 
6). The Dictionary differs from other classifications princi- 
pally in the precise categorizing of thousands of occupations 
and in the inclusion of a category of ‘‘Service Occupations’’ 
combining domestic service, personal service, protective ser- 
vice, and building service and porters. These ‘‘services’’ are 
ordinarily categorized as skilled, semi-skilled, or unskilled 
labor. In Table 1 are presented the means and standard 
deviations of the scores made by the 921 Jewish males tested 
by the three editions of the A.C.P.Z. and arranged according 
to their fathers’ occupations. Of the native-born fathers, cnly 
one was reported as being employed in a service occupation, 
one in semi-skilled labor, and none in unskilled labor. With 
the omission of these three groups there remain only three 
categories to be compared with one another among the 8’s of 
native parents, t.e., professional and managerial, clerical and 
sales, and skilled labor. The first two are very close together, 
with the clerical and sales ranking first instead of second as 
might have been expected, and skilled labor some 14 to 16 
points below the two. 

Among the sons of foreign-born parents the ranking is in 
the order of professional and managerial, skilled labor, clerical 
and sales, service, and semi-skilled labor. Clerical and sales 
which ranked first among sons of native parents is in third 
place among offspring of the foreign-born. It will be noticed 
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that only 2.97 points separate sons of professionals and man- 
agers from skilled labor, and only 10.02 points the extremes, 
i.e., the professional and managerial from semi-skilled labor. 
In the combined group of 8’s of both native and foreign- 
born parents the groups rank in the following order: pro- 
fessional and managerial, clerical and sales, skilled labor, ser- 
vice occupations, and semi-skilled labor. This is in accord with 
the traditional ranking, but the difference between the highest 


TABLE 1 


Means and Standard Deviations of Total Scores of 921 Jewish Males, 
Tested by the 1935, 1936, and 1937 Editions of the Ameri- 
can Council Psychological Examination* 








Birthplace of parents 





Father ’s 


‘ Native Foreign Combined 
occupation 





N Mean SD N Mean SD N Mean SD 





Professional & 

Managerial 74 198.55 45.82 192 188.67 45.09 266 191.42 45.25 
Clerical & Sales 92 200.80 40.82 347 185.19 46.37 439 188.46 45.71 
Service 

Occupations i |) Eon 10 181.50 56.05 11 181.78 53.44 
Skilled Labor 13 184.50 40.19 150 185.70 40.13 163 185.60 40.14 
Semi-skilled 

Labor 1 2740 ........... 41 178.65 42.71 42 180.93 44.66 
Total 181 199.03 43.27 740 185.78 44.76 921 188.39 44.75 





* All individual scores were first transmuted into equivalent scores of 
the 1935 edition. 


and the lowest of the ranks is only 10.49. The sons of native- 
born parents average higher scores than the sons of the for- 
eign-born parents for each occupational group. It should be 
noted that as one goes from the highest to the lowest of the 
occupational groups one finds relatively more children of for- 
eign-born parents. The sons of the foreign-boru made up 72 
per cent of the professional and mangerial class, 79 per cent 
of the clerical and sales group, and 91, 92, and 95 per cent 
respectively of the service, skilled, and semi-skilled groups. 
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TABLE 2 


Differences between Means, Sigmas of the Differences, Critical Ratios, 
and Chances in 100 of True Differences Greater than Zero 





Groups Highermean Diff. SDaier. C 


Chances 
BR “in 100 





Of Native Parents 





Professional & Managerial 
and Clerical & Sales 2.25 
Clerical & Sales 


Professional & Managerial 

and Prof.& Manag. 8.05 
Service, Skilled, Semi-Skill. 
Clerical & Sales 


and Clerical & Sales 10.30 
Service, Skilled, Semi-Skill. 


6.82 


12.82 


12.41 


33 


63 


83 


63 


74 


80 





Of Foreign-born Parents 





Professional & Managerial 
and Prof.& Manag. 3.77 
Clerical & Sales 


Professional & Managerial 
and oa ¥ 3.26 
Skilled Labor 


Professional & Managerial 
and re ee 10.31 
Semi-Skilled Labor 


Professional & Managerial 

and “ “ 4.91 
Service, Skilled, Semi-Skill. 
Clerical & Sales 


and Skilled Labor 51 
Skilled Labor 


Clerical & Sales 

and Clerical & Sales 6.54 
Semi-Skilled Labor 
Clerical & Sales 

and as sie 5.98 
Service & Semi-Skilled 


3.94 


4.61 


7.42 


4.38 


3.96 


7.03 


6.77 


96 


71 


1.39 


1,12 


13 


93 


88 


83 


76 


92 


86 


55 


83 


81 





Skill 


Semi. 
Skill 


Servi 
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TABLE 2 (Continued) 





Higher mean Diff. SDairr. CR yogy 





Skilled 

and Skilled 7.05 7.438 .95 83 
Semi-Skilled Labor 

Skilled 

and 6.49 7.18 .90 
Service & Semi-Skilled 





Of Native and Foreign-born Parents Combined 


Professional & Managerial 


and Prof.& Manag. 3.16 3.53 .90 
Clerical & Sales 


Professional & Managerial 
and 
Skilled Labor 


Professional & Managerial 
and 
Semi-Skilled Labor 





Professional & Managerial 
and 
Service & Semi-Skilled 


Clerical & Sales 
and Clerical & Sales 
Skilled Labor 


Clerical & Sales 
and 
Semi-Skilled Labor 


Clerical & Sales 
and 
Service & Semi-Skilled 
Skilled Labor 
and Skilled Labor 
Semi-Skilled Labor 


Skilled Labor 
and 
Service & Semi-Skilled 


Native Parents 


and Native Parents 13.25 
Foreign-born Parents 
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As will be noted from Table 1, the SD’s range between 40 
and 46, with the exception of the group of service occupations 
where the Standard Deviations are 53.44 and 56.05. One 
would expect greater variability among the scores in this cate- 
gory than among the others since a wide range of occupations 
requiring much or litt!+ skill is included here. 

The only difference that was found to be significant was that 
between the mean scores of the sons of native parents and those 
of foreign-born parents where the CR was 3.68. No differ- 
ence reliably greater than zero was found between any of the 
occupational groups. Of 21 comparisons among the occupa- 
tional groups only four showed more than 86 chances in 100 
of a true difference. One of these was between the sons of 
professional and managerial and the semi-skilled foreign-born 
parents where the chances were 92 in 100, and the other three 
in which the native and foreign-born parents were combined 
were between professional and managerial and skilled labor, 
between professional and managerial and semi-skilled labor, 
and between professional and managerial and service and semi- 
skilled labor combined. In these three latter comparisons the 
chances were 93 in 100 of a true difference. 


E.. SUMMARY AND CONCLUSIONS 


Nine hundred twenty-one Jewish males entering Washing- 
ton Square College as freshmen, born in the United States, 
graduates of the public high schools of New York City or its 
suburbs, and not over 20 years of age, were among those given 
the American Council Psychological Examination between 
1935 and 1937. The parents of 181 of these students were 
native-born while the parents of 740 of them were foreign- 
born. 

The students were further divided into groups according to 
the occupational statuses of their fathers and their mean scores 
compared with one another. We found the usual ranking 
from professional and managerial through skilled labor not 
to be borne out in the rankings of the sons of native parents. 
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The mean of the professional and managerial sons was 198.55, 
while the mean of the total group of 181 males was slightly 
higher, #.e., 199.03. None of the differences between the 
groups of sons of native parents was significant. 

The ranking of the S’s of foreign-born parents was in the 
order of: professional and managerial, skilled labor, clerical 
and sales, service, and semi-skilled labor, again not quite the 
usual ranking of scores according to parental occupation. 
Furthermore, less than three points separated the average 
scores of the total group from the average of the professional 
and managerial group. None of the differences obtained be- 
tween the offspring of the foreign-born was significant. 

In combining the scores of the children of native and for- 
eign-born we found the rankings to follow the usual trend, 
namely professional and managerial, clerical and sales, skilled 
labor, and semi-skilled labor. The service occupations con- 
sisting of various degrees of skill fell between the skilled and 
the semi-skilled labor groups. Again the difference between 
the professional and managerial! group average and that of the 
total group was only three points, and no differences between 
the groups were found to be reliably greater than zero. 

Although the comparison between the sons of the foreign- 
born and the sons of native-born parents was an incidental 
one it happened to be the only comparison that yielded a sig- 
nificent difference. This difference favored the latter group. 

While the difference between the average scores of the ex- 
tremes of the occupational categories was from 10 to 14 points, 
a non-reliable difference, yet the difference found between a 
group of Negro and White students at Washington Square 
College (9) equated with one another in various ways includ- 
ing cecupational ranking, was from 40 to 44 points of the 
A.C.P.E., this being a significant difference. From these two 
studies it would seem that the ‘‘occupational’’ factor is of 
much less influence on intelligence test performance of college 
students at Washington Square College than is the ‘‘racial’’ 
factor. 
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The absence of significant differences between our groups 
of Jewish males may have been due in part to any or to sev- 
eral of the following factors: (1) Our S’s were for the most 
part first or second generation Americans, the economic posi- 
tions of their parents being less stable than those of many 
other college students. (2) Our students were Jewish, thereby 
probably having greater similarity of culture in their varied 
socio-economic statuses than is the case of non-Jewish S’s. (3) 
We controlled certain variables including age and urban-rural 
factors not commonly considered in studies of this type. (4) 
The higher tuitional requirement may have resulted in greater 
selectivity in the middle and lower occupational brackets at 
Washington Square College than at the University of New 
Mexico where the principal investigation in this field was 
made. 
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AN INTEREST TEST FOR ROUTE SALESMEN 
AND MECHANICS* 


RUTH DIETZ CHURCHILL 
War Department 


I. INTRODUCTION 


OR some years it has been possible to differentiate the 
interests of men engaged in professional and business 
occupations by means of the Strong Vocational Interest 
Blank; no comparable blank has been available for lower 
occupational levels. Data from the Employment Stabilization 
Research Institute (1) indicate that while the criterion groups 
used by Strong were 82 per-cent professional and semi-pro- 
fessional and only 18 per cent non-professional, the distri- 
bution of adults gainfully employed in three large cities in 
Minnesota was 14 per cent professional and semi-professional 
and 86 per cent non-professional. When the Strong Interest 
Blank was given to these non-professional groups, there were 
indications of differential occupational interest patterns. It 
seemed possible, therefore, to construct a test specifically de- 
signed to differentiate non-professional occupations; the pres- 
ent study is a preliminary report on an attempt to do so. 
The study reported here was limited to constructing a test 
differentiating route salesmen from mechanics. There were 
several reasons for this: the problem was a practical one; the 
two groups were sufficiently distinct and numerous to warrant 
the construction of a test; and if differences could be estab- 
lished between these two groups, the possibility of a non- 
professional interest blank would be demonstrated. 


1 This study was done at the University of Minnesota under the direc- 
tion of Donald G. Paterson. 
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II. THE TEST 


We did not attempt to devise an interest test radically dif- 
ferent from those in use but rather to adapt the usual type of 
test to the groups of men with whom it was to be used. This 
was done mainly by selecting and devising items more likely 
to be part of the experience of these men than many of the 
items previously used. It seems probable that to be most 
useful, a test should contain items especially adapted to the 
groups with which it is to be used. 

The original test consisted of 247 items to which the subject 
responds Like, Indifferent, or Dislike; these were arranged in 
five subtests: occupations (166), amusements (15), school sub- 
jects (13), activities (32), and peculiarities of people (21). 
The items were drawn from previous interest blanks (106) and 
from the Minnesota Occupational Rating Scale (94) ; 47 items 
were specially devised for this test. They were selected as the 
items most likely to differentiate salesmen from mechaiics. 
Directions for taking the test were given at the beginning of 
each section. 

The method of administering the test to the criterion groups 
was simply by asking each man personally for cooperation 
and then giving him the test to fill out by himself. The work- 
ing conditions of the men necessitated this procedure, one 
often used clinically. 


III. THE CRITERION GROUPS 


The test was given to four groups of men: one group of 41 
wagon-route salesmen for a bakery company and three groups 
of mechanics: 10 automobile mechanics in the City of Minne- 
apolis garage, 13 carpenters from a woodworking company, 
and 38 mechanics employed by a company manufacturing 
automatic heating and air conditioning equipment. Thus the 
salesmen were a narrow group while the classification of the 
mechanies is a broad one. The cooperation of the workers was 
gained through their employers. 

As can be seen in Table 1, the 41 salesmen and the 61 
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mechanics differed widely in age, schooling, and experience. 
The group differences in age and experience are above the .01 
level of probability while that for education is between the .01 
and the .02 level. Comparisons of the two groups are to be 
made with caution, however, as the personal data on the 38 
mechanies from the air conditioning company were unavail- 
able; it is conceivable that these would change the picture. 


TABLE 1 


Personal Data on the Criterion Groups 





Age* Education* Experience* 





... > ok: SS 2 ae eS ek 





Mechanics 22 421 28 15 9.7 20 20 222 112 
Salesmen 37 28.7 78 38 113 22 36 48 65.1 
13.4 1.7 16 0.7 174 2.2 

7.76,p<.01 2.47,.02>p>.01 7.98,p<.01 





* In years. 


Mechanics do, however, gain trade status at a fairly late age 
so that we may accept it as probable that if men are selected by 
testing those employed at given companies, the age and ex- 
perience of mechanics are likely to be greater than those of 
salesmen, of whom no apprenticeship is required. 

The question arises as to whether any differences between 
the groups on the test are to be attributed to these back- 
ground differences in addition to, or rather than, the oecu- 
pational differences. Since this test is similar to previous 
interest blanks, we may refer to Strong’s data on this point. 
Strong has shown that age changes in interest between 25 and 
55 years are slight. He also found that ‘‘interests were not 
particularly aifected by years of activity in a given occupa- 
tion’? (3). We shall later present some evidence of our own 
to indicate that differences of age, experience, and education 
probably do not account for the differences found between 
the two groups. 
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IV. STATISTICAL PROCEDURE 


Using the papers from these two criterion groups, we con- 
structed scoring keys for the test. The first step in this 
process was to determine whether the two groups might have 
been drawn from the same population or were appreciably dis- 
similar. x? tests were run on two samples of 10 items each; 
in both eases the difference between the salesmen and the 
mechanics was above the .01 level of probability, indicating 
that it was significant. After this general difference had been 
established, item analysis was used. 

For the item analysis, the two groups were compared with 
each other rather than with a ‘‘Men in General’’ group. The 
proportion of one group marking any one item either ‘‘Like,’’ 
‘*Indifferent,’’ or ‘‘ Dislike’? was compared with the proportion 
of the other group which marked the same item in the same 
way, giving a total of 864 comparisons. The significance of 
each comparison was measured by the use, for ease of com- 
putation, of the formula, 


OR = : wie 


Vop.? + op2? 

and of Paterson and Edgerton’s tables (2). Since N rather 
than the number of degrees of freedom was used in computing 
the standard deviation of the distribution, our use of the 
tables does not yield true values of t.2 The error introduced 
was probably slight. It was necessary to set an arbitrary 
value of C.R. to determine whether or not a comparison was 
to be included in the scoring key; we chose 2.56, which is at 
the .01 level of probability for an infinite number of cases. Of 
the 864 comparisons, by chance between 6 and 12 would be 
expected to meet this criterion; actually 129 did. 


V. RESULTS 


The final test of validated items shows some slight differ- 
ences from the original groups of items. Occupations and 


2 Elsewhere in this report, t values are those given by Fisher’s formula. 
See: Fisher, R. A., Statistical Methods for Research Workers, 6th ed., 
London: Oliver & Boyd, 1936, p. 128. 
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activities account for 89.8 per cent of the items instead of 80.2 
per cent as previously ; peculiarities of people drops from 8.5 
per cent to 2.3 per cent. It seems probable that amusements, 
school subjects, and peculiarities of people could be dropped 
as separate headings and combined with the activities items 
with no loss in the test’s efficiency but rather an increase. The 
types of items liked and disliked by salesmen and mechanics 
are what one might expect. A few differences are unexpected ; 
why, for example, should salesmen strongly dislike freehand 
drawing ? 

Two methods of scoring the test were used. In the first, 
each item which salesmen favored over mechanics gave a score 
of one unit on a salesmen’s key; similarly, each item the 
mechanics favored gave a unit score on a mechanics’ key. In 
the second method, the keys were combined : items on the sales 
key were scored +1, and items from the mechanics’ key were 
scored — 1, the total score being the sum of the plus and minus 
scores. In both methods, elaborate item analysis and scoring 


weights were avoided, and a simple method was employed. 
There is evidence to show that the arbitrary use of +1 and 
-1 weights rather than weights in accord with the size of the 
differences between the groups correlates .927 with a more 
elaborate method (4). 


TABLE 2 
Validated Items 





Indifferent Dislike Total 





Mechanics . 14 25 54 
Salesmen 3 27 75 





Table 2 shows the distribution of the 129 validated items. 
The sales key thus consisted of 75 items, and the mechanics’ 
key, of 54. The combined key using all 129 had a possible 
score range of +75 to — 54. 

The data presented in Table 3 indicate that the means of 
the two criterion groups differ significantly, no matter whether 
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TABLE 3 
The Scores of the Criterion Groups on the Three Keys 











Mechanics’ key Salesmen’s key Combined key 
Mechanies Salesmen Mechanics Salesmen Mechanics Salesmen 
Mean .......... 29.85 13.09 9.82 33.74 — 20.02 20.65 
8.D. —_— 7.29 6.38 6.77 11.01 11.20 15.71 
2 934 997 874 1.72 1.43 2.45 
Range ......... . 11t042 2to27 2to4l 12to6l 23to-38 -12 to 56 
Diff. x’s ..... 16.76 23.92 40.67 
eh aS 11.97,p < .01 14.18, p << .01 15.43, p < .01 





the two keys, the salesmen’s and the mechanics’, or the com- 
bined key is used. All the t values exceed the .01 level of 
probability. The salesmen are more variable than the mechan- 
ics on the sales key and are probably so on the combined key. 
There is little overlapping between the two groups on any 
of the keys. On the mechanics’ key, 4 salesmen exceed —1 
S.D. of the mechanics’ distribution, although 14 mechanies 
fall below this point; 82.4 per cent of both groups may be 
said to be correctly placed. Two mechanics exceed —1 S.D. 
of the salesmen’s distribution on the sales key while only 6 
salesmen score below this point; 93.1 per cent are correctly 
placed on this key. The 18 misplaced on the mechanics’ key 
and the 8 misplaced on the sales key are accounted for by 21, 
rather than 26 men. Two mechanics and 3 salesmen not only 
scored more than a 8.D. below the mea:is of their respective 
groups but also scored above the corresponding point on the 
other key. The other men fall into two categories: 15 men, 3 
salesmen and 12 mechanics, who seored more than a S.D. below 
the mean on both keys, and one salesman who scored above 
this point on both keys. On the combined key the scores fall 
in three ranges: ‘‘mechanics’ scores,’’ all those —9 or lower; 
‘*sales scores, ’’ those + 5 or higher ; and ‘‘neutral scores,’’ those 
falling between —9 and +5. Fifty-three mechanics and 36 
salesmen, a total of 87.3 per cent, were correctly placed. One 
mechanic and 3 salesmen, 2.9 per cent of the total group, had 
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scores the opposite of what might have been expected; the 
remainder, 7 mechanics and 3 salesmen, had ‘‘neutral’’ scores. 
All of those incorrectly placed on the combined key are also 
incorrectly placed on one or both of the single keys. As far 
as differentiating the criterion groups goes, there is little to 
choose between the use of the two keys and that of the com- 
bined key. 

The odd-even reliability of the mechanics’ key is, uncor- 
rected, .882, and corrected by the Spearman-Brown formula, 
.937; that of the sales key is .892, and corrected, .945. The 
two values for the combined key are .931 and .964. These 
values are high enough to justify the use of any one of the 
three keys for predicting individual scores. 

Neither in differentiation of mechanics and salesmen nor in 
reliability is there much to choose between the use of the two 
separate keys or of the combined key.* It may be more con- 
venient to work with a single score, although there are theo- 
retical objections to the use of a combined key. What the 
test measures in each comparison is only salesmen versus 
mechanics, so that the failure to earn a mechanic’s score on a 
particular item represents not so much a non-mechanic’s score 
as a salesman’s score, and conversely for a sales score. Thus 
a low score on the ‘mechanics’ key can be interpreted as a sales 
rather than a non-mechanic’s score; a low score on the sales 
key, as a mechanic’s rather than a non-sales score. We can 
definitely identify non-sales and non-mechanie’s scores only by 
separately comparing the two groups with a ‘‘ Men in General’’ 
group. The ‘‘neutral’’ range on the combined key represents 
a compound of mechanics’ and salesmen’s interests, either 
negatively, through not registering on either key, or positively, 
through registering on both of them. It is likely, then, that 
combining the keys may cause some confusion which using the 
single keys would avoid. 

3 In fact, the salesmen’s and the mechanics’ keys correlate —.77; and 


the mechanics’ and the combined keys —.86, indicating that all three 
keys measure essentially the same thing. 
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VI. EFFECT OF AGE, EXPERIENCE, AND EDUCATION 


Previously we have shown that the mechanics tested were 
significantly older and more experienced than the salesmen, 
and probably significantly less well educated. On the basis of 
previous studies, it was assumed that these background differ- 
ences could not have accounted for the interest differences 
which have been found between the two groups. To test this 
assumption further, the resulting scores on each of the keys 
of the youngest and the oldest, the least and the most experi- 
enced, and the least and the most educated quarters of the 
salesmen’s group were compared. These comparisons were, in 
each case, between two groups of salesmen less alike in respect 
either to age, experience, or education, than was one of these 
two groups and the mechanics’ group. Between the youngest 


TABLE 4 


Comparison of Youngest and Oldest, Most and Least Experienced, Most 
and Least Educated Salesmen 





— Scores Scores Scores 


: on me- on sales- on com- 
Bro Fe chanics’ men’s bined 
pat key key key 





_ 


x 8D. x SD. x SD. =x SD. 





22.1 12 12.7 7.0 314 12.6 
38.9 5.7 12.6 5.7 326 7.3 
16.8 0.1 1.2 

8.6 0.03 0.4 


Least Education 8.3 0.5 112 58 35.1 10.9 

Most Education ..... 14.1 1.1 154 52 35.0 14.3 
5.8 4.2 0.1 

t 14.8 1.6 0.02 


Least Experience .. 13 03 118 56 37.5 11.2 

Most Experience ..... 11.7 7.1 144 7.3 30.5 13.0 
10.4 2.6 7.0 

t 5.1 0.9 13 














* In years. 
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and the oldest salesmen there was a difference of 16.8 years 
in contrast with 3.2 years between the oldest salesmen and the 
mechanics. The least educated group had 5.8 years less 
schooling than the best educated group and 1.6 years less than 
had the mechanics. In experience, the contrast was between 
1.3 and 11.7 years for the least and the most experienced sales- 
men and between 11.7 and 22.2 years for the most experienced 
salesmen and the mechanics. If any of these factors seriously 
affect the interest patterns, we would expect the oldest, the 
most experienced, and the least educated groups of salesmen 
to show lower sales scores and higher mechanics scores than 
did the youngest, the least experienced, and the most educated. 
As Table 4 shows, there is no evidence whatsoever of such 
being the case. Within the sales group, at least, background 
differences do not affect the test scores, and thus we have addi- 
tional evidence that the differences in interest patterns between 
the criterion groups are not caused by background differences. 


VII. VALIDATION : SAMPLE, PROCEDURE, RESULTS 


In the construction of personality and interest tests, the vali- 
dation of the test is always an important problem. It is par- 
ticularly a problem here because the salesmen ard the mechan- 
ics differ in their backgrounds, because the only measure we 
have that these groups are representative samples of successful 
salesmen and mechanics is that they have been employed for 
at least a year immediately prior to being tested, and because 
these samples may have been biased by the fact that the men 
were approached for cooperation first through their employers. 

Validation of the test was carried out on a group of men 
who had filled out blanks as part of another study.* The 
manner in which this group was selected differed greatly from 
the way the criterion groups had been chosen, giving a group 
probably free of any biases to which the original group had 
been subject. A group of 77 patients in three Eastern tuber- 


¢ Churchill and Churchill, ‘‘The Effect of Long Hospitalization on Job 
Interests.’’ (In preparation.) 
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culosis sanatoria were given the test by a fellow patient. The 
advantages of this were the following: each man was ap- 
proached individually for his cooperation in filling out the test 
by someone whom he regarded as an equal rather than as a 
superior and who had no connection with his employment; 
about 90 per cent of those asked to fill out blanks did so; the 
group was fairly homogeneous as to age, education, and ex- 
perience; and the men had been employed by a wide range of 
employers. The disadvantage of this group was that the men 
were in sanatoria and had been there, on the average, for 
eighteen months. None of the men was in sick wards, and 
most of them were definitely convalescent when tested. The 
eriterion of success was merely that of having been employed 
at the time of becoming sick. As we can see in Table 5, of the 


TABLE 5 ; 
Personal Data on the Sanatorium Groups 





Age Educa- Experi- 


- T.1LS.* 
N tion ence 





“a — 


x SD. 7 ons. &. = OP. 





21t 376 110 98 3.1 132 10.7 24.4 
6 365 100 95 33 80 4.6 10.1 
33.0 92 98 32 86 74 18.0 





* Time in sanatorium, in months; others are in years. 
t For Education, N = 20. 


77 in the group, 6 were salesmen, none of them route salesmen, 
and 21 mechanics. There were no significant differences be- 
tween the mechanics, the salesmen, and the neutrals in age, 
education, or experience, although the difference in experience 
between the mechanics and the neutrals is probably significant. 
Except for the fact that the mechanics had probably been in 
a sanatorium a significantly longer time than the salesmen, all 
had been in sanatoria about the same length of time. The 
validation group of mechanics was significantly less experi- 
enced than the criterion mechanies ; in age and education, the 
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differences were not significant. The validation salesmen were 
probably significantly older than the criterion salesmen but 
did not differ significantly in education or experience. 

Of the 21 mechanics, 20 were correctly placed on the me- 
chanics’ key ; 16 on the salesmen’s key ; and 15 on the combined 
key. Of the 6 salesmen, 6 were correctly placed on the me- 
chanics’ key, and 5 on both the salesmen’s and the combined 
keys. All the misplaced scores on the combined key fall in 
the ‘‘neutral’’ range. Thus considering only the salesmen and 
the mechanics, 96.3 per cent were correctly placed on the 
mechanics’ key; 77.8 per cent, on the sales key; and 74.1 per 
cent, on the combined key. On this basis, the mechanics’ key 
worked better, and the other two only slightly less well with 
this group than with the criterion groups. Table 6 shows that 


TABLE 6 
Scores of the Validation Groups on the Three Keys 





Mechanics’ Sales key Combined 


N key 
x SD. x SD. x SD. 


key 








Mechanics 30.6 5.0 15.1 6.6 10.6 
Salesmen 11.6 4.6 31.2 12.2 14.9 
Neutrals 20.2 9.0 24.1 6.2 4.3 12.8 





Diff. t Diff. t Diff. t 





Mech. vs. Salesmen ........... 19.0 7.76 -16.1 4.13 -35.5 6.52 
Mech, vs. Neutrals 10.6 687 -—- 9.0 4.11 -19.8 6.25 
Sales. vs. Neutrals 8.4 2.42 7.1 1.65 15.7 2.79 
Valid. vs. Crit. Mech. ..... 0.7 0.42 §.3 3.12 45 1.61 
Valid. vs. Crit. Sales. ...... -15 051 - 17 031 - 6.7 0.10 





the difference between the mechanics and the salesmen on each 
key is significant, but that the difference between the mean of 
each validation group and that of the corresponding criterion 
group is significant only in the case of the mechanics’ scores on 
the sales key. As far as mechanics and salesmen are con- 
cerned, the results obtained on the criterion group are not 
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peculiar to those groups; the mechanics’ key appears to stand 
up better than the salesmen’s key. 

The test is not so designed that the scores of persons in other 
occupations can be readily interpreted on it; one might expect, 
however, that they would fall below —1 8.D. on the separate 
keys and between —1 S.D. for the salesmen and —1 S.D. for 
the mechanics on the combined key. On the mechanics’ key, 
32 per cent of the ‘‘neutral’’ group exceeded —1 8.D.; on the 
salesmen’s key, 60 per cent exceeded —1S.D.; on the combined 
key, 12 per cent exceed —1 8.D. of the mechanics’ distribution, 
and 38 per cent, —-18.D. of the salesmen’s distribution. Thus, 
‘*neutrals’’ are often misclassified; they seem better differen- 
tiated from the mechanics than from the salesmen. If these 
neutrals are to be adequately classified, a ‘‘Men in General’’ 
group should be used as the contrast to each individual occu- 
pation. 

VIII. SUMMARY AND CONCLUSIONS 


1. The purpose of this study was to construct an interest 
test differentiating mechanics and route salesmen. 

2. Forty-one salesmen and 61 mechanics were tested; the 
mechanies were significantly older, less well educated, and 
more experienced than the salesmen. 

3. Of the original 247 items, involving 864 comparisons, 88 
items, involving 129 comparisons, yielded differences between 
mechanics and the salesmen, the C.R. values of which exceed 
2.56. 

4. Two methods of scoring the test were used: two keys, a 
mechanics’ and a salesmen’s, and a combined key. Both meth- 
ods give the same results. The latter is easier to use, but there 
are theoretical objections to its use. 

5. On all the keys, the salesmen differed significantly from 
the mechanics. No more than 17.6 per cent of the criterion 
groups was misplaced on any key so that the amount of over- 
lapping was small. 

6. A comparison of the youngest and the oldest, the least 
and the best educated, the least and the most experienced 
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salesmen revealed no differences in test scores. Thus age, edu- 
cation, and experience do not affect the test scores for the sales- 
men. 

7. The test was repeated on a group of 77 tuberculous 
patients, of whom 6 were salesmen and 21 mechanics. The test 
differentiated the salesmen from the mechanics significantly 
with little overlapping. The 50 men in other occupations were 
not differentiated from the salesmen by the test and were not 
very well differentiated from the mechanics. This resulted 
from the manner in which the test had been constructed. 

8. The test demonstrates the possibility of interest testing 
on the non-professional level of occupations. 
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psychic effects on the gastric motility of the fasted hu- 
; man subject (Frick, Seantlebury, and Patterson, 1935; 
‘$ Scantlebury and Patterson, 1938; Scantlebury, 1938) a study 
fk was made of periods in which dreams occurred. The work of 
two investigators which correlate dreams and hunger activ- 
ity has been found. Luckhardt (1916) working on dogs, ; 
determined the dreaming state by the bodily movements of 
the sleeping animal and concluded that dreams had an in- 
hibitory effect on the hunger movements of this animal. 
Wada (1922), using human subjects came to the conclusion 
that dreams occurred only in the period of contraction. 
More recently, MeGlade (1942) also working on humans 
showed a relationship between gastric motility and the mus- 
eular twitches of the right foot during sleep and dreaming. 
In these experiments, the dream response appeared at a defi- 
nite time interval following the ingestion of different foods, 
in which the pyloric activity was accompanied by a character- 
istic series of small foot twitches in groups of 20 to 50, and 
each foot movement was found to coincide with the sound of 
the relaxing pyloric sphincter. Evidence is offered to show 


agnor to a long series of observations of various 
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1 Preliminary reports of this work were given before the American 
Physiological Society at Detroit, Michigan, April 11, 1935, and at Balti- 
more, Maryland, March 31, 1938, brief abstracts of which were published 
in the Proceedings of that society (American Journal Physiology, 1935, 
113, 47; ibid., 1938, 123, 179). 
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that the dreaming occurred at the end of the series of foot 
movements and accompanied the complete evacuation of the 
stomach. 

The experiments discussed in this paper should further our 
present knowledge concerning the influence of normal and 
hypnotically induced dreams on the gastric motility of the 
empty stomach of man. 


EXPERIMENTAL PROCEDURE 


The method employed in this investigation is similar to that 
described in an article by Scantlebury and Patterson (1941). 
Two of the three human subjects, R. E. 8. and H. L. F., gave 
reports of dreams occurring during normal sleep and R. E. 8. 
also reported occurrence of hypnotically induced dreams. The 
hypnotism was of the hypotaxie type induced by sugyestion 
and the focal point methods (Cannon, 1932). 


RESULTS AND DISCUSSION 

The authors have found that the exact time during which a 
dream occurs is elusive of record. The several dreams re- 
ported by our subjects could be placed in the period of con- 
traction with reasonable certainty. This confirms Wada’s 
findings. When a dream was hypnotically induced, even if 
the s..gestions were given when the stomach was in motor 
quiescence the dream as exhibited by the gastric records did 
not oeeur until the hunger movements were in actual progress. 

It was our custom to ask the subject when he awakened from 
a sleep if he had dreamed and to recount the dream in detail 
by writing a record as he remembered it. In four of seven 
cases of reported dreaming the subject could definitely state 
that he had dreamed but, despite concerted effort, could recall 
nothing more than the fact that a dream had occurred. Gas- 
tric hunger movements recorded by the balloon and ink-record- 
ing method (Patterson, Seantlebury and Gijsbers, 1935; and 
Patterson, 1938) from the stomach of H. L. F. are shown in 
Figure 1. These were made following a fast period of eighteen 
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hours. The subject had been sleeping for twenty-five minutes 
but had become restless at point 4, wakened at point 5 and re- 
ported that he had dreamed but was unable to recall the con- 
tent of the dream. Note the definite temporary inhibition of 
the hunger movements with a marked increase in the height 
of the interposed contractions. There was no marked change 
in the general tonus level. At point 6 the subject was again 
sleeping soundly. In each of the cases where the dream con- 
tent was unknown the subject wakened and reported his dream 
while contractions were still in progress or following a shorter 
than normal activity period. This gave evidence of motility 
being inhibited by the dream mechanism. 

In Figure 1 it will be noted that the period of activity was 
first evident between points 2 and 3. This places the dream 
stimulus in the first two-fifths of the normal hunger period. 
Thus, we would expect the inhibition to be temporary and the 
hunger contractions to return to normal after the subject 
awakened. 

Of the three dreams reported in which the subject was able 
to report some of his experiences one was especially wivid. 
The description given by the subject two and one-half hours 
after waking was as follows: 

[ dreamed that I was on a great ship and that it was 
part of a parade mounted on a motor truck. This truck 
was going past the main campus of the university. The 
decks of the ship seemed to be crowded with people all 
chasing me to the bow or front. When I was finally as 
far forward as I could go a great crowd of people many 
of them Negroes surged toward the front of the ship and 
started throwing pies. I reached over excitedly to grab 
some of the food for I suddenly realized that I was very 
hungry. After great effort and what seems to me now 


as failure, I thought that I was cast out into space and 
awoke. 


The effect of this dream on the gastric hunger movements 
of the subject, R. E. S., is shown in Figure 2. He appeared 
restless at point 1 on the tracing and at point 2 awakened and 
reported that he had been dreaming. Note the temporary 
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inhibition of the hunger movements and the rapidity with 
which the normal rhythm returned after the subject wakened. 
This dream of food was, as far as can be determined, entirely 
spontaneous. 

Figure 3 shows the results of one of the hypnotically in- 
duced dreams. R. E. 8S. was the subject and H. L. F. the 
operator. R. E. S. had been in a satisfactory state of light 
hypnotic sleep for twenty minutes. The hypnotism was con- 
summated while the stomach was in a period of motor quies- 
cence. The operator in the process of hypnotism suggested 
that the subject would soon imagine himself walking through 
a fruit orchard at harvest time, that he would have a great 
desire to pick the luscious fruit from the peach trees and eat 
some of it. When R. E. 8S. awakened at the point marked X 
on the tracing he reported a very vivid dream but did not 
recall that.any such suggestions as outlined above had been 
given to him. He reported his dream as follows: 


I was walking through a fruit market in which great 
rows of peaches were displayed on well-ordered racks in 
upright and inverted triangular wooden supports. A 
sensation of extreme hunger accompanied the dream. | 
vainly attempted to stretch forth my arm and pick some 
of the fruit from the racks. 


Typical of all our reported dreams the exact point at which 
this dream occurred could not be determined. The tracing 
showed that a rather abrupt inhibition of the hunger move- 
ments followed by a slight lowering of the gastric tonus was 
apparent about four minutes before the return to conscious- 


ness. There is also an absence of the customary incomplete 
tetanus ending. The period of contraction had progressed for 
twenty-three minutes before the first indication of inhibition 
occurred. The average length of the normal hunger periods 
from the subject R. E. S. was 33 minutes with a range of from 
28 to 34 minutes. H. L. F. produced an average hunger ac- 
tivity period of 20 minutes with a range of between 16 and 34 
minutes. This would place the point of stimulus in the second 
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two-fifths of the normal period length and we would expect 
inhibition to be complete. Point B on the tracing indicates 
a quick body movement at awakening as though the subject 
may have been startled. No external stimuli were noted. 

The cause of the inhibition due to dreaming is a matter of 
conjecture. In hypnotized subjects when the eating of food 
is suggested there is a marked increase in salivation and an 
increase in the total quantity as well as the total acidity of 
the gastric juice (Bennett and Venables, 1920). Tne inhibi- 
tion seems to depend upon the total quantity of free acid in 
the stomach (Hellebrandt, 1935). 

Luckhardt (1916), from his observations on dogs, argues 
that since the hunger contractions are usually stronger during 
sleep because the extero-ceptive stimuli have been eliminated, 
the inhibition due to cerebral activity (dreaming) must of 
necessity be central in origin. While this may be true it does 
not fully explain the mechanism that produces the inhibi- 
tory effect. It may be that (1) the cerebral process of dream- 
ing directly influences the number of impulses passing over 
the splanchnic nerves, or (2) that due to the cerebral activity 
during dreaming the cephalic phase of gastric secretion is in- 
itiated. There is an increase in the total acidity of the gastric 
juice which sets up the normal inhibitory mechanism by way 
of the afferent sensory fibers from the gastric mucosa and 
causes reflex inhibition via splanchnics. It is difficult to deter- 
mine whether or not there is a latent period between the stim- 
ulus and the inhibition since the point at which a dream occurs 
is practically impossible to establish by the methods employed. 


SUMMARY 


1. Dreams have an inhibitory influence on the stomach move- 
ments during hunger. 

2. The exact point at which dreaming occurs is difficult to 
ascertain. However, it appears evident from the reports of 
Luckhardt (1916) and Wada (1922) and from work in our 
laboratory that dreams occur only in the period of active 
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hunger contractions. No dreams were reported by other in- 
vestigators or found from our studies during the quiescent 
period. 

3. Our work indicates that hypnotically induced dreams, 
at least when food is an integral part of the dream, causes an 
inhibitory effect to be exhibited as in normal spontaneous 
dreaming. 

4. The point at which the dream occurs in the hunger period 
determines whether the contractions will return and a normal 
ending result or the period cease abruptly and pass into quies- 
cence. Where the dream could be localized in the early part 
of the period as in Figure 1 the contractions returned follow- 
ing temporary inhibitory effects. If the dream was localized 
in the second two-fifths of the normal period’s length there was 
complete inhibition (Fig. 3). No dreams could be localized 
in the period of incomplete tetanus after the increasing tonus 
had commenced. 

5. The mechanism of the inhibition is a matter of conjec- 
ture. When food is part of the dream content an impulse, 
probably of central origin, initiates the cephalic phase of gas- 
tric secretion. The resulting increase in free acid content of 
the stomach establishes a reflex the efferent pathway of which 
may be the splanchnic fibers to the stomach wall. 
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CHANGES IN COLOR FIELDS OCCASIONED 
BY EXPERIMENTALLY INDUCED 
ALCOHOL INTOXICATION* 


HENRY B. PETERS 
University of Nebraska 


I. INTRODUCTION 


HE purpose of this paper is to present experimental 

evidence of the effects of alcohol on the areas of the 

retina sensitive to color as measured by the color 
fields on a perimeter and to infer certain changes in the 
individual. 

The basis for the investigation is found in the work of T. 
A. Brombach,? Luther Peter,’ and others on the effects of 
exogenous and endogenous toxemia, fatigue and glandular 
dysfunction on the color sensitivity of certain areas of the eye. 
The explanations accompanying the experimentally deter- 
mined fields will be based upon their work. The hypothesis 
is that changes in the body chemistry are manifested by 
changes in the areas of the retina sensitive to color. Past 
experiments and research have established this hypothesis 
fairly conclusively. 

The effects of alcohol are of the exogenous toxic type and 
may be studied in relation to other exogenous toxemias. An 
exogenous toxemia, that is, intoxication, is caused by any toxic 
substance taken into the body from the environment. A sub- 


1 The data for this experiment were collected during April, 1939, at the 
University of Nebraska with the advice of D. A. Worcester, Chairman 
of the Department of Educational Psychology and Measurements. 

2 Brombach, T. A., Visual Fields, published by the Distinguished Ser- 
vice Foundation of Optometry, 1930. 

3 Peter, L., Principles and Practices of Perimetry, Philadelphia: Lea & 
Febiger, 1923. 

692 





CHANGES IN COLOR FIELDS 693 


stance may be considered toxic (poisonous) when its adminis- 
tration in small dosages is followed by damage to the organism, 
that is, physio-chemical or physiological reactions caused by 
the action of the substance on the body cells causing patholog- 
ical and symptomatic changes in function and/or tissue. 

The normal size and relation of the various fields must be 
considered before abnormality can be detected. The normal 
relation of the various fields as measured on a perimeter in 
order of size is: motion (largest), form, blue, red, and green. 
Toxie conditions, metabolic variations, and other factors 
affecting the condition of the retinal chemistry may cause 
variations in the relative size and order of the fields. 

Past investigation has shown that the size and shape nor- 
mally are influenced by the following factors: the bony promi- 
nences of the face and shape of the orbital cavity, activity of 
the retina, width of the palpebral fissure (vertical), state of 
refraction, illumination, size of the pupil, hour of the day, 
and the accuracy of the operation. Variations from the normal 
may be caused by glandular disbalance, metabolic disturbances, 
toxie conditions, ete. 

The stages of intoxication with the accompanying changes 
may generally be said to be: 


a. Stimulative Stage (not present in alcohol). 
1) Fields for red, green and blue enlarged. 
2) Green interlacing or overlapping red. 
3) High ductions and reserves. 
4) No pathological indications. 
5) No symptomatic indications. 


b. Depressive Stage. 


1) Slight contraction of color fields. 

2) The red field interlaces or overlaps the field for blue. 
3) Ductions and reserves low. 

4) Symptoms of discomfort and fatigue. 

5) No pathological indications. 
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c. Degenerative Stage. 


1) Marked contraction of fields for blue, red, and 
green, and in later stages collapse of form field. 

2) Possible central relative scotoma for green and later 
a toxic amblyopia. 

3) Ductions and reserves very low and only measur- 
able before amblyopia. 

4) Marked symptoms of discomfort, nausea, and loss 
of vision as scotoma develops. 


The form and motion fields usually remain unimpaired in 
intoxication though the form field may become contracted in 
degenerative stages. 


II. EXPERIMENTAL PROCEDURE 


For this experiment a male college student acted as subject. 
He was accustomed to drinking beer in moderate amounts. 
The normal size of the subject’s visual fields was determined 
over the range of time of the experiment, two o’clock until five 
o’clock, on the day preceding the administration of the alcohol 
and found to be normal. 

The fields for each eye were determined again at the start 
of the experiment and were found to be normal in size and 
relation. Then twelve ounces of beer, one can, containing six 
per cent alcohol by volume were given the subject. The fields 
were measured again and so on; one can of beer every half 
hour with measurement in between. The first field was taken 
at two o’clock, the last at five-forty after a total of seven cans 
of beer, eighty-four ounces, had been consumed. 

A ten-minute period was allowed for drinking and then the 
fields for the right eye were measured. This was followed 
immediately by the measurement of the fields for the left eye. 
The resulting changes in the fields are noted. The period of 
measurement was ten minutes for each eye. Thus\the entire 
procedure was repeated once every half hour. 

Twenty-four hours after the start of the experiment, the 
final pair of color fields were taken. 
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Ill. THE EXPERIMENTALLY DETERMINED COLOR FIELDS 


Fig. 1. The shape and size of the visual fields are deter- 
mined in part by the shape and size of the orbit and bony 
prominences of the face. The left diagram indicates the limits 
of the orbit in a typical individual. 

The right diagram shows the key for designating the area 
and limits of the various visual fields. It is to be understood 
that all the area, except the blind spot, from the limiting line 
to the fixation point is sensitive to the particular stimulus. 

Fig. 2. Visual fields taken at 2:00 p.m. for the right eye 
and 2:10 p.m. for the left eye on Friday, April 28. The field 
for the left eye will always appear on the left side and will 
be represented as the patient, not the observer, sees it. These 
fields indicate large fields in normal relation. 

Fig. 3. Visual fields taken at 5:00 p.m. for the right eye 
and 5:10 p.m. for the left eye on Friday, April 28. Figure 2 
and Figure 3 taken together indicate the normal variation over 
the time of the experiment. The fields are in normal relation 
and large in size. The slight contraction in almost all merid- 
ians is assumed to be due to the effects of fatigue, but since it 
is so small it will not materially interfere or influence the 
results. Part of the afternoon was spent in reading and part 
in outdoor recreation. 

Fig. 4. Visual fields taken at 2:00 p.m. for the right eye 
and 2:10 p.m. for the left eye on Saturday, April 29. These 
were taken at the start of the experimental period before any 
beer had been administered. The visual fields were again 
found to be in normal relation and large in size. They are very 
similar to the ones taken at the same hour on the previous day. 

Fig. 5. Visual fields taken at 2:30 p.m. for the right eye 
and 2:40 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer. Very little of 
significance is indicated except that mild variations in shape 
and size of fields are present. They retain the same general 
size and the relations between them are normal. 

Fig. 6. Visual fields taken at 3:00 p.m. for the right eye 
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and 3:10 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer making a total 
of twenty-four ounces consumed prior to this measurement. 
There are mild variations in shape indicated, but the relations 
between the fields are still normal. There is a rather marked 
contraction of both the field for red and the field for green 
shown in each eye while the other fields remain approximately 
the same size. 

Fig. 7. Visual fields taken at 3:30 p.m. for the right eye 
and 3:40 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer making a total 
of thirty-six ounces consumed prior to this measurement. The 
right eye indicates rather noticeable changes in shape, though 
the fields still retain their normal relations. There is also a 
contraction of the field for blue and enlargement of the field 
for red in the superior nasal quadrant. In the left eye there 
is a definite enlargement of the field for red, and it is coinci- 
dent with the field for blue in the vertical meridian. This 
indicates the development of a depressive stage of exogenous 
toxemia. 

Fig. 8. Visual fields taken at 4:00 p.m. for the right eye 
and 4:10 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer making a total 
of forty-eight ounces consumed prior to this measurement. 
The right eye shows enlarged fields for red and green over the 
immediately preceding measurements, and the fields are still 
in normal relation. The left eye, however, shows the first over. 
lapping of the red over the blue, and this is in the vertical 
meridian. This indicates the first stage of the depressive toxic 
effect of alcohol. 

Fig. 9. Visual fields taken at 4:30 p.m. for the right eye 
and 4:40 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer making a total 
of sixty ounces consumed prior to this measurement. The right 
eye also shows the first stages of the depressive toxic effect of 
alcohol. The red overlaps the blue in the vertical meridian. 





CHANGES IN COLOR FIELDS 699 


There is an enlargement of the field for red and a sectoral 
contraction of the field for blue. The left eye indicates a 
marked overlapping of the red field over the blue field in the 
superior portion of the field—45°, 90°, and 135°. There is 
also a marked enlargement of the red and green fields and a 
contraction of that for blue. 

Fig. 10. Visual fields taken at 5:00 p.m. for the right eye 
and 5:10 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer making a total 
of seventy-two ounces consumed prior to this measurement. 
The enlargement of the red and contraction of the blue is very 
marked, showing the advanced stages of a depressive toxemia 
in both the right and left eyes. This was accompanied by the 
behavior pattern of moderate drunkenness. 

Fig. 11. Visual fields taken at 5:30 p.m. for the right eye 
and 5:40 p.m. for the left eye on April 29 immediately follow- 
ing the administration of twelve ounces of beer making a total 
of eighty-four ounces consumed prior to this measurement. 
Both eyes indicate the collapse of the color fields with red still 
overlapping the blue. The form and motion fields are still 
intact. This marked collapse indicates the first stages of a 
degenerative toxic condition. This condition was accompanied 
by the behavior pattern of drunkenness. This condition is 
usually followed by the collapse of the form field and a rela- 
tive scotoma for green. The experiment was discontinued 
because of the condition of the patient. 

Fig. 12. Visual fields taken at 2:00 p.m. for the right eye 
and 2:10 p.m. for the left eye on Sunday, April 30. These 
fields are taken twenty-four hours after the start of the experi- 
mental period. In spite of the residual or ‘‘hang-over’’ effects 
of the subject, the visual fields in both eyes have returned to 
normal size and relation. 


IV. CONCLUSIONS 


From the preceding color field charts representing the areas 
on the retina sensitive to various wave lengths of light and 
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from comparisons of these with other series of other toxic 
conditions, it may be concluded that the color fields furnish 
further evidence that alcohol is a depressant and not a stimu- 
lant. It will be noted that no stimulative stage of exogenous 
toxemia was attained, but that the fields showed normal rela- 
tions until the depressive stage was manifested. 

As in other exogenous toxemias, when the toxic agent is 
eliminated for a period of about twenty-four hours, the color 
fields are restored to normal relation. 

This is but a single case, and the tolerance for alcohol and 
other toxic agents varies greatly with individuals, so no con- 
clusions should be drawn with respect to the quantity neces- 
sary to produce these changes. The changes will undoubtedly 
occur with sufficient administration of alcohol, but the time of 
appearance will depend on the individual tolerance or ab- 
sorptive capacity. 





SCORING FORMULAE FOR A MODIFIED 
TYPE OF MULTIPLE-CHOICE 
QUESTION 


Wrrs Rererence To Item Form, Iystructions, 
AnD Macutne Scorine* 


LLOYD V. SEARLE 
University of California 


incomplete statement, or as a direct question, followed 
by a number of alternative choices, which will plausibly 
complete the statement or answer the question. In most ob- 
jective tests which employ the multiple-choice form, the num- 
ber of alternatives intended as correct in each question is made 


Ts multiple-choice question is ordinarily phrased as an 


constant throughout the test, and examinees are instructed to 
choose the ‘‘best answer’’ or ‘‘best two answers,’’ etc. The 
most common practice is to supply four, or five, choices to each 
question and make one of these correct, the number of choices 
being also made constant when the scoring formula to be used 
involves a correction for chance successes. Many variations 
of this practice are, of course, possible, and several have been 
discussed at length in the educational textbooks. 

The present report is concerned with a modified form which 
differs primarily in that the number of correct alternatives is 
made variable in successive questions, and examinees are not 
told how many are correct in each. This form of question is 
not original with the writer and is possibly familiar to many 
test constructors elsewhere, though it seems to have missed 
attention in the treatises on scoring and administration of 

*The writer is indebted to Professor Warner Brown for suggesting 
some of the topics of this report, and for his criticisms during its 
preparation. 
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objective test questions. It was introduced at the University 
of California a few years ago and has since been used success- 
fully in course examinations. For this purpose, where labor 
of test construction and time of administration are often im- 
portant factors, it has been found superior to the conventional 
form in a number of respects. It has the particular advantage 
of allowing greater freedom in the selection of plausible alter- 
natives, and is easily adaptable to various kinds of subject 
material. Its principal virtue, to the statistically-minded ex- 
aminer, is that each single alternative can be made a differen- 
tiating unit in the scoring of the test, a fact which makes for 
a substantial increase in the possible range of scores over that 
of a test employing conventional multiple-choice questions of 
comparable length. 

Because of the potential value of the form for use in achieve- 
ment and other mental tests generally, the writer has under- 
taken to analyze its properties with respect to scoring and 
administration. Though it seems a rather obvious and simple 
variation from the conventional form, its use has been found 
to introduce important differences in the testing situation 
which must be taken into account in order to attain the maxi- 
mum test validity. The purpose of this paper is to indicate 
the nature of these differences, and to present briefly a ra- 
tionale of the scoring and instructions which have been 
adopted. 


DESCRIPTION OF THE QUESTION FORM 


Some illustrative questions taken from tests in advanced 
psychology courses are reproduced below. The exact formula- 
tion of questions involving the same principles may, of course, 
be varied in a number of ways. A similar arrangement is 
possible for certain kinds of matching items and, as will be 
later discussed, is also adaptable for true-false questions. The 
correct alternatives are those with numbers enclosed in paren- 
theses. 


17. In general, rating scale methods have been preferred to 
the method of paired comparisons because they 
1. are more objective. 
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(2.) require less time. 

(3.) have a wider range of application. 
4. are more reliable. 

5. eliminate the halo effect. 


. Which of the following factors influencing a coefficient of 
correlation operate(s) systematically te lower the obtained 
value? 


(1.) random errors of measurement. 

2. errors of random sampling of individuals. 
3. increased range of talent. 

(4.) grouping errors. 

(5.) curvilinear regression. 


Studies reviewed in class showed tendencies toward female 
superiority with regard to 


1. arithmetical reasoning. 
(2.) scholastic achievement. 
3. spatial aptitude. 

4. mechanical aptitude. 

5. emotional stability. 


A typical one-hour examination contains thirty to forty 
questions similar to those above, though the number varies 
depending upon level of difficulty, average number of alterna- 
tives, and other factors. The scoring method to be described 
does not require that each question have five alternatives, nor, 
in fact, that the number of alternatives in successive questions 
be constant; questions containing as low as one choice, though 
uneconomical, do not affect instructions or scoring. In gen- 
eral, though the number of correct alternatives in individual 
questions may vary from zero to five (or ‘more, for longer 
questions), it should be approximately equal to the number of 
incorrect alternatives in the examination as a whole. To dis- 
courage the use of extraneous cues it is desirable to randomize 
the pattern of correctness, including an occasional question 
which has either none or all of the alternatives correct. 


SCORING 


For reasons of convenience the description of scoring pro- 
cedure will be based on the use of machine answer sheets of 
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the types which are supplied in printed form by the Interna- 
tional Business Machines Company. These are generally 
familiar to the constructors of mental tests, and the scoring 
formulae used with the machine are identical to those used 
with manual and other methods of scoring when responses to 
the questions have been made according to the same general 
plan. 

The procedure of recording choices in the present case is 
similar to that of the ordinary multiple-choice examination. 
Standard 5-choice answer sheet forms are used, the students 
being instructed to mark the space corresponding to each alter- 
native judged correct, and to leave unmarked the space for 
each alternative judged incorrect. 

The scoring of answer sheets involves the use of a key sheet, 
in which holes are punched at positions corresponding to the 
correct alternatives. When the machine is adjusted for scor- 
ing a given answer sheet, the dial will register as Right all of 
the marks in the punched positions and as Wrong all marks in 
any other positions. It is also possible to obtain direct dial 
readings of Right minus Wrong and of Right plus Wrong. 
Thus, the machine is sensitive only to the numbers of marks 
on an answer sheet, and does not register for positions in which 
no marks appear. This fact is significant in scoring answer 
sheets of the type in question because some of the positions are 
correctly, and some incorrectly, left unmarked, and the pro- 
portions of these may vary among answer sheets. To ignore 
these unmarked components would result in reducing the num- 
ber of effective items (i.e., single alternatives) in the test, and 
would consequently lower the reliability. A method of taking 
them into account, based on complementary relationships with 
the marked items, has been derived as follows: 

Considering the answer sheet of a given individual, ‘four 
kinds of responses can be distinguished. Let 

R =the number of marked-correct alternatives 
R’ =the number of unmarked-incorrect alternatives 
W =the number of marked-incorrect alternatives 
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W’ =the number of unmarked-correct alternatives 
and, with respect to the key, 
C=the number of correct alternatives 
I =the number of incorrect alternatives 
As previously noted, only the R and W responses (i.e., the 
marked ones) are involved in the scores which are obtainable 
on the machine. For reasons to be discussed in a later para- 
graph, the desirable score is the total number of correct re- 
sponses, that is, R+R’. It can be seen that 
C=R +W’ 
I=R’+W 
Adding these two equations results in an expression of the 
total answer sheet responses in terms of the total keyed alter- 
natives : 
R+R’+W+W’=Cil 
Substituting for C its value in terms of answer sheet responses, 
R+R’+W+W’'=R+W’'+!I 
Rearranging terms and cancelling W’, 
R+R’=R-W:I 

From the last equation it is seen that the formula, R- W +I, 
may be used as a means of obtaining indirectly the total num- 
ber of correct responses, including those of leaving unmarked 
the items which are incorrect by the key. Use of the formula 
involves setting the machine to read R — W, and adding to each 
score the value J, which is a constant for all answer sheets in 
a given test. The addition must be separately performed, 
since no provision for additive constants is made in the adjust- 
ments allowed by the machine. 

The addition of a constant to all scores does not, of course, 
affect the relative status of any individual, and could therefore 
be omitted for most statistical purposes. Scores obtained 
directly by use of the formula, R-— W, would correlate per- 
fectly with scores obtained by counting the R +R’ responses. 
In eases where examinations are given to statistically naive 
students, however, it has been found advisable to report scores 
based on the total number of correctly judged alternatives and 
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not to explain the method by which they were obtained, since 
students tend to interpret the subtraction of wrongs as a cor- 
rection for ‘‘guessing.’’ 


CORRECTION FOR GUESSING 


From the preceding outline of the scoring method it can be 
deduced that a correction for the effects of guessing is not 
serviceable. The situation is analogous to that of the true- 
false examination in which students have been instructed to 
mark every item, in that the total number of responses is con- 
sidered and treated as constant for all individuals. Though 
an estimated ‘‘true’’ score could be obtained by subtracting the 
number of wrong (W+W’) responses from the number of 
right (R + R’), no advantage would be gained as these ‘‘true’’ 
scores would correlate perfectly with the number right or with 
the number wrong. 

By analogy with the practice commonly followed for mul- 
tiple-choice tests, use of the formula, R—W, might at first 
seem reasonable (since the numbers of right and wrong alter- 
natives are equal on the average) as a means of removing the 
chance element when instructions have been given not to guess. 
Such instructions, however, would almost certainly lower the 
validity of the obtained scores. A student who followed the 
directions faithfully would tend to have a reduced number of 
both R and W responses, since together these comprise the 
total of his marked alternatives. The reduction would consist 
wholly in not marking a number of items of whose correctness 
the student felt only partial certainty. Of these, the number 
of actually correct ones should be in the same proportion as 
among those for which he expressed complete certainty.’ In- 
sofar as his knowledge of the subject were greater than zero 
(i.e., R greater than W), his R score would be reduced by a 


1 This statement is based on the principle that better than chance judg- 
ments (of correctness as the subject knows it) can be made within the 
region of uncertainty. It is assumed that the actually incorrect alterna- 
tives have not been so worded that partial knowledge increases the prob- 
ability of their being judged correct. 
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greater absolute amount than W and his R-—W score there- 
fore further decreased below that of the student who ignored 
the directions and reacted on the basis of partial certainty. It 
may be noted, then, that instruction not to guess is inconsistent 
with any reasonable method of scoring the answer sheets, 
whether by formula R, R- W, or R-W+I. 

The procedure of recording answers could, of course, be 
modified in such a way as to permit making the correction for 
chance successes. One solution would be to number the alter- 
natives consecutively throughout the test and use true-false 
answer sheets, each alternative being given the status of a 
true-false question ; if each alternative were separately marked 
as correct or incorrect, no mark indicating lack of decision, 
then the formula, R —- W, might be used to obtain the ‘‘true’’ 
scores. Such a procedure would involve some sacrifice of sim- 
plicity and economy, however, and does not seem desirable. 
The ‘‘variable-correct’’ aspect of this type of test is itself an 
effective factor in reducing the examinee’s expectation of 
chance successes. In spite of the fact that the theoretical 
chance proportion is actually increased over that of the con- 
ventional multiple-answer test, students have proved to be 
primarily concerned with not knowing how many alternatives 
in any given question are correct; they frequently report feel- 
ing deprived of extraneous cues and forced to base their an- 
swers upon actual knowledge of the items. From a pedagogi- 
cal standpoint this appears to be a ‘‘good’’ attitude, and is of 
possibly greater value than the debatable correction. 


PRE-EXAMINATION INSTRUCTIONS 


It is apparent from the above considerations that examinees 
should be cautioned to consider each alternative independently, 
and to make their responses consistent with ‘‘best guesses’’ in 
cases of uncertainty. It is also advisable to inform them that 
the number of correct alternatives in the examination as a 
whole is about fifty per cent, but is variable among individual 
questions. By analyzing distributions of part-scores in a 
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number of examinations, the writer has found that careful in- 
structions will overcome the net effects of certain response 
tendencies which seem to be carried over from experience with 
the more familiar multiple-choice situation. The over-conser- 
vative student fears to mark any item unless he is certain that 
it is correct, and others tend to judge the alternatives on a 
comparative and selective basis, marking only those which 
appear to be ‘‘best answers.’’ The effect of such tendencies is 
similar to that of instruction not to guess, and can be reduced 
by emphasizing that failure to mark a correct alternative is as 
much an error as marking an incorrect one. 


TEST CONSTRUCTION 


It may be observed that one commonly used type of multiple- 
choice question is not suitable for the treatment suggested 
above. Choices are sometimes expressed in such a way that if 
one is judged to be correct, the others are obviously incorrect, 
as, for example, among alternative solutions to a problem in 


arithmetie. The foregoing discussion does not apply to mul- 
tiple-choice questions of this variety, as they are ideally suited 
to the conventional multiple-choice methods of scoring and in- 
struction. In general, the inclusion of mutually-exclusive 
alternatives in a test of the present type should be avoided, 
since they would not. carry weights comparable to those of 
other items in the test. A number of standardized tests are 
now in use, however, which contain questions almost identical 
to the examples given above except that only one choice is made 
correct in each question; modification of these along the lines 
suggested would almost certainly result in increased relia- 
bilities. 

One characteristic of multiple-choice tests in general is that 
the danger of ‘‘over-sampling’’ small areas of a heterogeneous 
subject matter is greater than with true-false tests. The diffi- 
culty is minimized, obviously, by a reduction in the average 
number of alternatives and consequent increase in the number 
of questions. It is also possible in the present case to adapt 
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true-false items to the multiple-choice plan, as the principles 
of scoring and instruction are essentially identical. It will 
often be found that true-false items dealing with a fairly exten- 
sive subject matter can be grouped into a relatively small num- 
ber of categories, and advantage can be taken of this fact to 
increase the flexibility of the test. The few examples follow- 
ing will serve to illustrate some convenient methods of group- 
ing, and may incidentally indicate that fine degrees of transi- 
tion from the multiple-choice to true-false form are possible: 


2. Research studies of non-conformity have shown that 


1, 
(2.) 
3. 
(4.) 


our penal institutions, though defective, do tend to 
lessen the criminal tendencies of the inmates. 

more than 80 per cent of those who commit felony 
crimes are not caught for punishment. 

there is a higher incidence of crime among the foreign- 
born than among the native white population. 
family discord and instability are definitely related to 
juvenile delinquency. 


7. The taboo 


is a realistic precaution which takes account of the 
safety of the group. 

always appears in negative form. 

necessarily has social as well as biological reasons for 
its existence. 

owes its efficacy to fear of the social consequences of 
its violation. 


9. Ritual, according to accepted usage of the term, 


(1.) 
2. 


(3.) 
(4.) 


has as its enduring basis the satisfaction of emotional 
needs. 

includes acts of routine provision for physical neces- 
sities. 

is essentially non-logical. 

is habit-forming. 





NEWS AND NOTES 


The American Red Cross needs hundreds of social welfare workers and 
educators during the coming year to perform services to the military units 
both in this country and abroad. Among the new employees needed are 
men to serve as field directors at the military and naval centers, to counsel 
and advise men in the service regarding personal and family problems. 
Men are needed as assistant field directors for recreation, to serve with the 
task forces overseas, qualified to plan, organize and promote recreational 
activities such as sports, games, social recreation, entertainments, arts 
and crafts, music, dramatics, and game rooms. Both men and women 
are needed for club directors, program directors, staff assistants to operate 
clubs in leave areas overseas, some who qualify through executive or ad- 
ministrative experience comparable to the operation of a large community 
center, and others who qualify through recreation training and experience. 
Women medical and psychiatric social workers, case workers and recre- 
ation specialists are needed in military and naval hospitals both here 
and abroad. 

Those assigned to service in this country will receive from $135 to $200 
per month; those stationed outside the United States receive from $150 
to $275 plus an additional $50 per month maintenance allowance in mili- 
tary centers and full maintenance in club work. Those interested in re- 
ceiving further information or in making application for a position in 
the American Red Cross Services to the Armed Forces program should 
communicate with: Personnel Service, National Headquarters, American 
Red Cross, Washington, D. C. Those interested in a position within the 
United States only should apply to the nearest Red Cross area office. 
They are as follows: North Atlantic Area, 300 Fourth Avenue, New York 
City; Eastern Area, 615 N. St. Asaph Street, Alexandria, Virginia; Mid- 
western Area, 1709 Washington Avenue, St. Louis, Missouri; Pacific Area, 
Civie Auditorium, San Francisco, California. 


The following open letter is published by a group of prominent psy- 
chiatrists and clinical neurologists for the benefit of the many parents 


especially who have sons of 18 or 19 years of age who may be called to 
service : 


So much has been said and so much implied about the desirability 
of drafting 18- and 19-year-old men for military service from the 
viewpoint of emotional stability that it seems that in the public inter- 
ests a simple, direct statement should be made on this question. 
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ates as individuals we wish to assure the public and parents 
of this age group that there are no grounds for apprehension as to 
the effect of military service on these younger men as distinguished 
from the older men. Such statistics as are available indicate that 
the incidence of mental breakdowns is no ter in the 18- and 19- 
year age group than in the older group. If anything it is somewhat 
less. It would seem to us that the proposal now before the American 
Congress does not unduly compromise the future mental integrity of 
this particular age group or of the nation. With the government 
realizing and properly assuming this increased responsibility, we 
endorse favorable action upon the pro to include men of 18 and 
19 years under the Selective Service Act. 


This letter was signed by the following: Adolf Meyer, professor 
emeritus of psychiatry, Johns Hopkins University; C. Macfie Campbell, 
professor of psychiatry, Harvard University; Foster Kennedy, professor 
of neurology, Cornell University; C. Charles Burlingame, psychiatrist-in- 
chief, Neuro-Psychiatrie Institute, Hartford, Conn.; Edwin G. Zabriskie, 
professor of clinical neurology, Columbia University; Winfred Overholser, 
superintendent, St. Elizabeth’s Hospital, Washington, D. C.; S. Barnard 
Wortis, professor of psychiatry, New York University; Tracy Putnam, 
professor of neurology, Columbia University; Oscar Diethelm, professor 
of psychiatry, Cornell University. 


Dr. Doncaster G. Humm, personnel consultant and co-author of the 


Humm-Wadsworth Temperament Test, addressed the Psychology-Philos- 
ophy Section of the Southern California Junior College Association at 
Los Angeles City College on October 17, 1942. His topic was ‘‘The Role 
of Judgment in the Administration of an Industrial Testing Program.’’ 


Dr. Richard W. Husband has recently joined the Research Division of 
the Industrial Relations Department of Carnegie-Illinois Steel Corpora- 
tion, Pittsburgh, Pa. 


The Society for Personnel Administration held its annual business 
meeting and dinner on August 24, 1942, at 6: 30 P.M. at the Y.W.C.A., 
Barker Hall, Washington, D.C. The speaker for this occasion was Gordon 
Clapp, General Manager of the Tennessee Valley Authority. His subject, 
‘<What Makes Workers Want to Work, The Real Source of TVA’s 
Power’’ was termed ‘‘a nine-year story of watts and workers told in forty 
minutes. ’’ 


The University of Minnesota is offering unusual opportunities for train- 
ing in personnel work for government offices and the armed forces. Being 
located in a metropolitan center adjacent to the state capital, a number of 
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state and municipal departments and state or regional offices of the 
national government are available for both information and field-work ob- 
servation. Fort Snelling, an army reception center, is within a few miles 
of the campus, as are a number of defense plants. In addition to the 
above, the University Testing Bureau is an agency for the educational 
and vocational counseling of both college students and adults and for 
student personnel research. Other examples of campus facilities are the 
Municipal Reference Bureau, the University Committee on Educational 
Research, and the Employment Stabilization Research Institute, an agency 
for conducting research and developing techniques in labor market analy- 
sis and occupational adjustment. 


The Employment Stabilization Research Institute of the University of 
Minnesota has conducted a number of special surveys relating to wartime 
problems, such as housing of war workers, civilian morale, priorities un- 
employment, and wartime transportation. A survey of migration to St. 
Paul was begun at the request of several agencies concerned with the 
wartime mobilization of manpower. The results of this survey showed 
that there were approximately 6,800 persons 14 years old and over and 
1,000 persons under 14 who had resided in St. Paul for six months or 
less. The occupational distribution of the recent arrivals differs from 
St. Paul workers in general in that a larger proportion of recent arrivals 
are in domestic service and other service work, and a smaller proportion 
in the professions, clerical and sales work, skilled crafts, and semi-skilled 
work. The industrial distribution of the recent arrivals differs from 
St. Paul workers in general in that a larger proportion are engaged in 
construction and personal service and a smaller proportion in manu- 
facturing; transportation, communication, and public utilities; and whole- 
sale and retail trade. 


On September 29 and 30, 1942, the American Management Association 
held a conference on Manpower and War Labor Problems at Hotel Penn- 
sylvania, New York City. Allocation of labor, stabilization of wages, a 


War Service Bill, a Central Hiring Agency were among the problems 
considered. 


‘Vocational Guidance for Victory’’ is the title of an 80-page manual 
issued by the War Service Committee of the National Vocational Guidance 
Association. Including contributions by fifteen government officials deal- 
ing with the Nation’s manpower, the publication brings together informa- 
tion on all aspects of the American wartime labor market. Special 
attention is given to opportunities in the armed forces, including the 
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operation of the Selective Service and Army Personnel Classification 
Systems. Employment and training opportunities in war industries are 
set forth. There are sections cn the new jobs open to women and on the 
problems of rural youth, the physically handicapped, and minority groups. 
Edited by Dr. Harry D. Kitson, Editor of Occupations, single copies of 
the manual may be obtained for 50 cents from the National Vocational 
Guidance Association, 425 West 123rd Street, New York City. 


‘*Teacher Education in a Democracy at War’’ is the title of a paper- 
bound report issued by the American Council on Education and prepared 
by E. 8. Evenden of Teachers College, Columbia University. The first 
chapter presents the general implications cf the war with regard to the 
fundamentally opposed ideologies involved, the interrelationship of war 
and peace, and the consequent importance of social habits and under- 
standings. Later chapters summarize the lessons of 1917-18 in this coun- 
try and England, pointing out the failure to maintain educational stand- 
ards in the face of new responsibilities and the serious effect that war 
had on the supply of teachers. Post-war trends in the United States are 
followed as manifested in changes in the schools, the education of teach- 
ers, and the relation between the two. History seems to be repeating 
itself in the present emergency in that a teacher shortage is again 
threatened. Mr. Evenden discusses some educational ‘‘ first things’’ and 
concludes with a list of specific recommendations addressed respectively to 
school systems, to colleges and universities, and to the public. 


The Committee on Tests of the Life Office Management Association has 
recently prepared a report on ‘‘ The Application of Psychological Tests to 
the Selection, Placement, and Transfer of Clerical Employees.’’ It was the 
feeling of the Committee, of which Marion A. Bills is chairman and Leonard 
W. Ferguson, secretary, that ‘‘it should be of service to member companies 
in using and interpreting the scores on the various psychological tests that 
the Committee has recommended to facilitate the selection and guidance of 
clerical personnel.’’ Included among the topics presented appear: The 
functions of tests in initial selection, placement, and transfer of clerical 
employees; The varieties of tests having practical value in office personnel 
procedure; What a personnel director has to know about tests; Criteria 
of success in clerical work; Test score interpretation; Methods by which 
a small company may secure the advantages of a large-scale testing 
program. 

The manual may be purchased through the Life Office Management 
Association, 110 East 42nd Street, New York City, at the price of $2.50. 
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Davis, Epwin W., A Functional Pattern Technique for Classification of 
Jobs. Teacher’s College, Columbia University. Contributions to Edu- 
cation No. 844, Bureau of Publications, Teachers College, Columbia 
University. New York, 1942, pp. x +128. 


In his statement of the problem the author gives 13 bases for job classi- 
fication. Most of these he characterizes, in the words of Charters, as 
‘*logical and structural.’’ The 13th basis of classification, which is the 
one used in this investigation, is in terms of functions performed by the 
worker. The writer says ‘‘ Functions have been used previously by both 
Charters and Uhrbrock but never before has the entire pattern of func- 
tions performed by each individual been tabulated and the distribution 
of positions in these patterns been studied.’’ This the author undertakes 
to do for occupations in the field of advertising. 

The data were derived from a national survey conducted by the Adver- 
tising Federation of America, covering men and women in all branches 
of advertising. From the results of this survey vocational history sched- 
ules or questionnaires were selected for 4,989 men, which were used to 
demonstrate the proposed technique of the study. 

By a method of coding for Hollerith cards, involving assigning a set 
of a geometric series of numbers or terms to each of five sets of fune- 
tions, it was possible to derive a wnique code number for each different 
pattern. Functional patterns are analyzed in terms of four general func- 
tions, buying, selling, creating and performing, and twenty-five special 
functions. 

The following steps, illustrated in the monograph by charts and tables, 
were used in developing the functional pattern technique: 

1. Devise a method of coding and code each advertising man’s com- 
bination or pattern of functions performed in his position so that both 
unique and similar sets of functions can be easily identified and tabulated. 

2. Tabulate the frequency of each pattern of functions in order to dif- 
ferentiate between rare patterns with five or fewer persons in them and 
common patterns with six or more persons in them. 

3. Make a frequency distribution of these common general functional 
patterns according to positions and types of businesses in order to deter- 
mine the extent to which common patterns represent all functional pat- 
terns in each position or business. 

4. Arrange the common specific functional patterns in order of their 


frequency and determine the number of men per position in each. This 
is the means of identifying the modal functional pattern of each position. 
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5. Determine the extent to which these modal patterns are typical of 
the common patterns of their respective positions and of the businesses in 
which the positions are found. 


6. Make a classification of the positions based on identical or similar 
modal specific functional patterns. 

The advantages of this method as compared with structural methods 
of classification of jobs are pointed out, and possible future uses of the 
method and problems for further research are indicated. 

Amos C. ANDERSON, 
Ohio University 


DEARBORN, W. F., AND RoTHNEY, JOHN W. M. Predicting the Child’s 
Development. Sci-Art Publishers, Harvard Square, Mass. 


In the fall of 1922 the senior author and his associates in the Psycho- 
Educational Clinic of the Harvard Graduate School of Education inaugu- 
rated the third Harvard Growth Study. ‘‘ Approximately 3500 children 
who were entering the first grade of three cities of the metropolitan area 
of Boston were examined. In addition to 12 annually repeated physical 
measvrements, a battery of mental and scholastic tests was administered 
annually to these same children for as long a time as they remained in 
school,’’ 

Among the advantages of the longitudinal approach to the study of 
growth man be mentioned these: norms or averages of growth may indi- 
cate only slight changes although the changes are considerable, and may 
show nothing of the changes occurring within the individual; practically 
every youngster makes within a year or so a more intense spurt or growth 
than is shown in the average. ‘‘ Further, it can be shown that these 
periods of most rapid growth in height and weight, and in many other 
physical (and perhaps mental) dimensions, coincide with the advent of 
the menarche in girls and of pubescence in boys, and that the earlier this 
event comes in the life-age of the individual, the more rapid and intense 
are the changes produced. Also, the onset of these changes—which are 
doubtless due to underlying endocrine factors—can be determined two or 
three years before their chief effect is attained.’’ 

Growth from about the age of six to maturity has three phases: A 
period of deceleration, a period of rapid acceleration, and a period of 
about two years of rather rapid deceleration, ending in maturity. If the 
most intense growth oceurs between 9.5 and 10.5, growth will be prac- 
tically finished at age 14. But if growth is greatest in the year 13.5 to 
14.5, full growth is almost attained by age 17. At age 19 the first group 
is the shorter, contrary to the belief of many. On the basis of these data, 
we can make a ‘‘certain amount of prognosis of the probable rate of 
growth in individual cases between ages 10 and 18, and a prediction of 
the age at which sexual maturity will likely be attained.’’ 
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The well-known height-weight chart hanging in every school principal’s 
office may now be taken down. ‘‘By including measures of the depth 
and width of the body, the senior authors have devised an equation’’ 
to determine the ‘extent of over- or under-weight. Nor should parents 
worry any more about a possible mental lag during a child’s period of 
intense growth. ‘‘ There is no significant relationship between the rapidity 
of physical development and scholastic achievement.’’ 

Probably chapter four is the most interesting in the book. The scores 
of the different measurements of mental growth did not mean the same, 
so the results of the different tests were equated to the Stanford Revision 
of the Binet Scale by a procedure devised by Ratcliff. The authors then 
wished to find the average curve of mental growth, to select a good- 
fitting mathematical curve for the same, and make a study of differential 
growth trends. 

To construct a growth curve, an adequate metric must be found. The 
scores on mental tests at year 8 were selected, a mean of zero and a 
standard deviation of 1 were arbitrarily assigned, and by using the 
Thurstone technique the mental age measures were converted into a 
substantially absolute unit. The Gompertz mathematical formula, y = kit* 
was found to agree closely with the data, and a ‘‘ Harvard Growth Study 
Curve’’ was calculated. An interesting feature of this curve is that 
mental growth continues to 30 years of age. 

Since the data range from year 7 to 16, the remaining distance of the 
curve must be extrapolated. ‘‘But the matter is not as bad as it seems 
at first. These 8 years, from 7 to 16, include, roughly, growth from about 
the 12th to the 86th percentile. Thus about three-fourths of the total 
growth is covered by this range. It is unfortunate that the maximum 
must be inferred from extrapolation. ...’’ Of especial significance to 
colleges and graduate schools is this period of mental growth. 

Another tradition, namely, that mental growth stops at age 15 or 16, 
is exploded. And, if it is possible to determine that a given individual 
has attained 60% of his growth while another has achieved only 40% in 
the same length of time, then students of guidance have a valuable tool. 
From school records growth curves may be plotted which will indicate 
growth trends, and aid in diagnosis and remedial treatment. 

When several physical characteristics were correlated with mental de- 
velopment as measured by gains in arithmetic and reading ability, the 
results were interpreted to mean only a chance relationship. If, however, 
growth is calculated in terms of percentage of growth attained at age 
18 years, the picture is changed and ‘‘there appears to be a greater 
correspondence between mental and physical growth.’’ Considerable 
space is given to a discussion of a regression equation by which ‘‘ normal 
weight for body build’’ ean be determined. But its practicability is 
questionable. 
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The volume emphasizes marked individual differences among all the 
groups measured, and ‘‘marked variability in individual growth curves 
appear throughout the course of the growth period.’’ ‘‘This principle 
of individual variability goes right to the root of such problems as the 
constancy of the I.Q., the use of height-weight tables, the prediction of 
time of maturity, prediction of age at which growth will cease and 
various similar problems.’’ 

The applications of the formula for developing a growth curve and 
predicting normal weight are important additions to the rest of the data 
(many of which have been published in other volumes) of this book. 

J. R. Gentry, 
Ohio University 
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